[ 
https://issues.apache.org/jira/browse/HADOOP-19071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819215#comment-17819215
 ] 

ASF GitHub Bot commented on HADOOP-19071:
-----------------------------------------

steveloughran commented on PR #6537:
URL: https://github.com/apache/hadoop/pull/6537#issuecomment-1956544980

   This is not good.
   
   But looking at the failures I don't know whether to categorise as "test 
runner regression" or "brittle tests failing under new test runner".
   
   Here are some of the ones I've looked at
   
   
   `TestDirectoryScanner.testThrottling`
   This test is measuring how long things took. it is way too brittle against 
timing changes, both slower and faster.
   ```
   java.lang.AssertionError: Throttle is too permissive
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at 
org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:901)
   ```
   
   I think the step here is to move to assertj so asserts fail with meaningful 
messages, see if the failure can be understood. Ideally you'd want a test which 
doesn't measure elapsed time, but instead uses counters in the code (here: of 
throttle events) to assert what took place.
   
   Test` TestBlockListAsLongs.testFuzz`
   
   See this painfully often else where -it means that the protobuf lib was 
built with a more recent version of java8 than the early oracle ones. Its 
fixable in your own build (use the older one) or cast ByteBuffer to Buffer. 
otherwise we need to make sure tests are on a more recent build.
   
   ```
   java.lang.NoSuchMethodError: 
java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
        at 
org.apache.hadoop.thirdparty.protobuf.IterableByteBufferInputStream.read(IterableByteBufferInputStream.java:143)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.read(CodedInputStream.java:2080)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.tryRefillBuffer(CodedInputStream.java:2831)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.refillBuffer(CodedInputStream.java:2777)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawByte(CodedInputStream.java:2859)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawVarint64SlowPath(CodedInputStream.java:2648)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readRawVarint64(CodedInputStream.java:2641)
        at 
org.apache.hadoop.thirdparty.protobuf.CodedInputStream$StreamDecoder.readSInt64(CodedInputStream.java:2497)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:419)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:397)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder.getBlockListAsLongs(BlockListAsLongs.java:375)
        at 
org.apache.hadoop.hdfs.protocol.TestBlockListAsLongs.checkReport(TestBlockListAsLongs.java:156)
        at 
org.apache.hadoop.hdfs.protocol.TestBlockListAsLongs.testFuzz(TestBlockListAsLongs.java:139)
   ```
   
   test `TestDFSAdmin.testDecommissionDataNodesReconfig`
   ```
   java.lang.AssertionError
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at 
org.apache.hadoop.hdfs.tools.TestDFSAdmin.testDecommissionDataNodesReconfig(TestDFSAdmin.java:1356)
   ```
   not a very meaningful message. suspect that a different ordering of the 
threads is causing the assert to fail.
   1. move to AssertJ
   2. analyse error, see what the fix is.
   
   Test `TestCacheDirectives`. 
   
   ```
   at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:403)
        at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:362)
        at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.waitForCachedBlocks(TestCacheDirectives.java:760)
        at 
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.teardown(TestCacheDirectives.java:173)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ```
   
   this is a timeout during teardown; after this subsequent tests are possibly 
going to fail. No obvious cause, though again I'd suspect race conditions.
   
   Rather than say "hey, let's revert", I'd propose a "surefire update triggers 
test failures" and see what can be done about addressing them. because we can't 
stay frozen with surefire versions.
   




> Update maven-surefire-plugin from 3.0.0 to 3.2.5      
> -------------------------------------------------
>
>                 Key: HADOOP-19071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19071
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build, common
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to