[ 
https://issues.apache.org/jira/browse/SUREFIRE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255829#comment-17255829
 ] 

Jingfei Hu edited comment on SUREFIRE-1808 at 12/29/20, 10:12 AM:
------------------------------------------------------------------

[~tibordigana] Thanks a lot for your reply, but i still think the call stack 
should provide me some trace to root cause. And we are finally able to get a 
dump of the parent process (vs. forked process) and locate where the heap is 
occupied. See below screenshot.

!image-2020-12-29-17-14-42-884.png|width=585,height=239!

We can see it's an object of 
org.apache.maven.plugin.surefire.booterclient.output.MultipleFailureException 
which is a subclass of IOException consumes a lot of memory. And we keep 
looking into what is inside of the object and find out the content is all the 
logs printed out by test execution in the form of thousands of RuntimeException 
objects residing the in the linked queue of MultipleFailureException. Below is 
the method _getLocalizedMessage_ of MultipleFailureException which I think 
contributes to OOM

 
{code:java}
@Override
public String getLocalizedMessage()
{
    StringBuilder messages = new StringBuilder();
    for ( Throwable exception : exceptions )
    {
        if ( messages.length() != 0 )
        {
            messages.append( '\n' );
        }
        String message = exception.getLocalizedMessage();
        messages.append( message == null ? exception.toString() : message );
    }
    return messages.toString();
}
{code}
 

Based on the dump, below is my detailed analysis with the sequence diagram.

!image-2020-12-29-17-18-56-557.png!

Followed by _ThreadedStreamConsumer#run_ method,

!image-2020-12-29-17-19-51-660.png|width=606,height=370!

Here is the key code snippets.

_ThreadedStreamConsumer#close()_
{code:java}
@Override
public void close()
        throws IOException
{
    if ( stop.compareAndSet( false, true ) )
    {
        items.clear();
        try
        {
            items.put( END_ITEM );
        }
        catch ( InterruptedException e )
        {
            currentThread().interrupt();
        }
    }

    if ( pumper.hasErrors() )
    {
        pumper.throwErrors();
    }
}
{code}
_ThreadedStreamConsumer#run()_
{code:java}
@Override
public void run()
{
    while ( !ThreadedStreamConsumer.this.stop.get() )
    {
        try
        {
            String item = ThreadedStreamConsumer.this.items.take();
            if ( shouldStopQueueing( item ) )
            {
                return;
            }
            target.consumeLine( item );
        }
        catch ( Throwable t )
        {
            errors.addException( t );
        }
    }
}
{code}
_ForkClient#consumeLine&processLine&createReportEntry_
{code:java}
@Override
public final void consumeLine( String s )
{
    if ( isNotBlank( s ) )
    {
        processLine( s );
    }
}

private void processLine( String event )
{
    final OperationalData op;
    try
    {
        op = new OperationalData( event );
    }
    catch ( RuntimeException e )
    {
        logStreamWarning( e, event );
        return;
    }
    final String remaining = op.getData();
    switch ( op.getOperationId() )
    {
        case BOOTERCODE_TESTSET_STARTING:
            getTestSetReporter().testSetStarting( createReportEntry( remaining 
) );
            setCurrentStartTime();
            break;
        case BOOTERCODE_TESTSET_COMPLETED:
            testsInProgress.clear();

            getTestSetReporter().testSetCompleted( createReportEntry( 
remaining, testVmSystemProperties ) );
            break;
        case BOOTERCODE_TEST_STARTING:
            ReportEntry reportEntry = createReportEntry( remaining );
            testsInProgress.offer( reportEntry.getSourceName() );

            getTestSetReporter().testStarting( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_SUCCEEDED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testSucceeded( createReportEntry( remaining ) 
);
            break;
        case BOOTERCODE_TEST_FAILED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testFailed( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_SKIPPED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testSkipped( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_ERROR:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testError( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_ASSUMPTIONFAILURE:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testAssumptionFailure( createReportEntry( 
remaining ) );
            break;
        case BOOTERCODE_SYSPROPS:
            int keyEnd = remaining.indexOf( "," );
            StringBuilder key = new StringBuilder();
            StringBuilder value = new StringBuilder();
            unescapeString( key, remaining.substring( 0, keyEnd ) );
            unescapeString( value, remaining.substring( keyEnd + 1 ) );
            testVmSystemProperties.put( key.toString(), value.toString() );
            break;
        
        ...other cases omitted...
        
    }
}

private TestSetReportEntry createReportEntry( String untokenized, Map<String, 
String> systemProperties )
{
    StringTokenizer tokens = new StringTokenizer( untokenized, "," );
    try
    {
        String source = nullableCsv( tokens.nextToken() );
        String name = nullableCsv( tokens.nextToken() );
        String group = nullableCsv( tokens.nextToken() );
        String message = nullableCsv( tokens.nextToken() );
        String elapsedStr = tokens.nextToken();
        Integer elapsed = "null".equals( elapsedStr ) ? null : decode( 
elapsedStr );
        final StackTraceWriter stackTraceWriter =
                tokens.hasMoreTokens() ? deserializeStackTraceWriter( tokens ) 
: null;

        return reportEntry( source, name, group, stackTraceWriter, elapsed, 
message, systemProperties );
    }
    catch ( RuntimeException e )
    {
        throw new RuntimeException( untokenized, e );
    }
}
{code}
 

And according to the dump, one of the detailedMessage of RuntimeException 
incurring NoSuchElementException is below
{code:java}
823 [main] INFO util.RetryUtil: 30176 milliseconds delay, attempt 5's result is 
null
 {code}
Apparently, it doesn't contain _*source*_, *_name_*, *_group_* and _*message*_ 
and possible stack trace separated by a comma(,) like createReportEntry 
expects, thus causes NoSuchElementException.  And the above message is just one 
piece of log entry during test run, it's barely a test result.

 

So my question becomes
 # What should the _item_ object feeding to method _processLine or consumeLine_ 
look like?  Is it the whole test output of forked process? Is it case-level or 
class-level?
 # Are there any documentation of inter-process communication between the main 
process and the forked process? I deeply doubt this issue would be resolved 
more quickly if I understand the mechanism. 
 # How to resolve this issue anyway? Upgrading maven-surefire-plugin is not an 
option right now due to corporate policy

 

Appreciate for any suggestions.

 


was (Author: jingfei):
[~tibordigana] Thanks a lot for your reply, but i still think the call stack 
should provide me some trace to root cause. And we are finally able to get a 
dump of the parent process (vs. forked process) and locate where the heap is 
occupied. See below screenshot.

!image-2020-12-29-17-14-42-884.png|width=585,height=239!

We can see it's an object of 
org.apache.maven.plugin.surefire.booterclient.output.MultipleFailureException 
which is a subclass of IOException consumes a lot of memory. And we keep 
looking into what is inside of the object and find out the content is all the 
logs printed out by test execution in the form of thousands of RuntimeException 
objects residing the in the linked queue of MultipleFailureException. Below is 
the method _getLocalizedMessage_ of MultipleFailureException which I think 
contributes to OOM

 
{code:java}
@Override
public String getLocalizedMessage()
{
    StringBuilder messages = new StringBuilder();
    for ( Throwable exception : exceptions )
    {
        if ( messages.length() != 0 )
        {
            messages.append( '\n' );
        }
        String message = exception.getLocalizedMessage();
        messages.append( message == null ? exception.toString() : message );
    }
    return messages.toString();
}
{code}
 

Based on the dump, below is my detailed analysis with the sequence diagram.

!image-2020-12-29-17-18-56-557.png!

Followed by _ThreadedStreamConsumer#run_ method,

!image-2020-12-29-17-19-51-660.png|width=606,height=370!

Here is the key code snippets.

_ThreadedStreamConsumer#close()_
{code:java}
@Override
public void close()
        throws IOException
{
    if ( stop.compareAndSet( false, true ) )
    {
        items.clear();
        try
        {
            items.put( END_ITEM );
        }
        catch ( InterruptedException e )
        {
            currentThread().interrupt();
        }
    }

    if ( pumper.hasErrors() )
    {
        pumper.throwErrors();
    }
}
{code}
_ThreadedStreamConsumer#run()_
{code:java}
@Override
public void run()
{
    while ( !ThreadedStreamConsumer.this.stop.get() )
    {
        try
        {
            String item = ThreadedStreamConsumer.this.items.take();
            if ( shouldStopQueueing( item ) )
            {
                return;
            }
            target.consumeLine( item );
        }
        catch ( Throwable t )
        {
            errors.addException( t );
        }
    }
}
{code}
_ForkClient#consumeLine&processLine&createReportEntry_
{code:java}
@Override
public final void consumeLine( String s )
{
    if ( isNotBlank( s ) )
    {
        processLine( s );
    }
}

private void processLine( String event )
{
    final OperationalData op;
    try
    {
        op = new OperationalData( event );
    }
    catch ( RuntimeException e )
    {
        logStreamWarning( e, event );
        return;
    }
    final String remaining = op.getData();
    switch ( op.getOperationId() )
    {
        case BOOTERCODE_TESTSET_STARTING:
            getTestSetReporter().testSetStarting( createReportEntry( remaining 
) );
            setCurrentStartTime();
            break;
        case BOOTERCODE_TESTSET_COMPLETED:
            testsInProgress.clear();

            getTestSetReporter().testSetCompleted( createReportEntry( 
remaining, testVmSystemProperties ) );
            break;
        case BOOTERCODE_TEST_STARTING:
            ReportEntry reportEntry = createReportEntry( remaining );
            testsInProgress.offer( reportEntry.getSourceName() );

            getTestSetReporter().testStarting( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_SUCCEEDED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testSucceeded( createReportEntry( remaining ) 
);
            break;
        case BOOTERCODE_TEST_FAILED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testFailed( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_SKIPPED:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testSkipped( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_ERROR:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testError( createReportEntry( remaining ) );
            break;
        case BOOTERCODE_TEST_ASSUMPTIONFAILURE:
            reportEntry = createReportEntry( remaining );
            testsInProgress.remove( reportEntry.getSourceName() );

            getTestSetReporter().testAssumptionFailure( createReportEntry( 
remaining ) );
            break;
        case BOOTERCODE_SYSPROPS:
            int keyEnd = remaining.indexOf( "," );
            StringBuilder key = new StringBuilder();
            StringBuilder value = new StringBuilder();
            unescapeString( key, remaining.substring( 0, keyEnd ) );
            unescapeString( value, remaining.substring( keyEnd + 1 ) );
            testVmSystemProperties.put( key.toString(), value.toString() );
            break;
        
        ...other cases omitted...
        
    }
}

private TestSetReportEntry createReportEntry( String untokenized, Map<String, 
String> systemProperties )
{
    StringTokenizer tokens = new StringTokenizer( untokenized, "," );
    try
    {
        String source = nullableCsv( tokens.nextToken() );
        String name = nullableCsv( tokens.nextToken() );
        String group = nullableCsv( tokens.nextToken() );
        String message = nullableCsv( tokens.nextToken() );
        String elapsedStr = tokens.nextToken();
        Integer elapsed = "null".equals( elapsedStr ) ? null : decode( 
elapsedStr );
        final StackTraceWriter stackTraceWriter =
                tokens.hasMoreTokens() ? deserializeStackTraceWriter( tokens ) 
: null;

        return reportEntry( source, name, group, stackTraceWriter, elapsed, 
message, systemProperties );
    }
    catch ( RuntimeException e )
    {
        throw new RuntimeException( untokenized, e );
    }
}
{code}
 

So my question becomes
 # Why would a simple StringTokenizer#nextToken throw NoSuchElementException? I 
can see no apparent reason to cause this?
 # How to workaround this issue? Is it related to the log format of my tests?

 

Appreciate for any suggestions.

 

> OutOfMemoryError running with TestNG
> ------------------------------------
>
>                 Key: SUREFIRE-1808
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1808
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin, TestNG support
>    Affects Versions: 3.0.0-M3
>         Environment: OS: Linux
>            Reporter: Jingfei Hu
>            Priority: Major
>              Labels: OOM, TestNG, surefire
>         Attachments: image-2020-12-29-13-31-31-539.png, 
> image-2020-12-29-13-58-44-076.png, image-2020-12-29-15-16-28-566.png, 
> image-2020-12-29-17-14-42-884.png, image-2020-12-29-17-18-56-557.png, 
> image-2020-12-29-17-19-51-660.png
>
>
> Hi team,
> We've been suffering OutOfMemory intermittently in our test labs recently. 
> The call stack is below 
> {noformat}
> [ERROR] Java heap space -> [Help 1]
> java.lang.OutOfMemoryError: Java heap space
>     at java.util.Arrays.copyOf(Arrays.java:3332)
>     at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>     at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>     at java.lang.StringBuilder.append(StringBuilder.java:136)
>     at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter$CloseableCloser.run(ForkStarter.java:200)
>     at 
> org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.CommandLineUtils$1.call(CommandLineUtils.java:301)
>     at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:615)
>     at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:283)
>     at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
>     at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1161)
>     at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1002)
>     at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:848)
>     at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
>     at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>     at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
>     at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
>     at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
>     at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
>     at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>     at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
>     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
>     at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
>     at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
>     at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
>     at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
>     at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>     at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229){noformat}
>  
> Our test case count is +4k and still increasing, and pass rate is around 95% 
> for each execution. The OOM error is not always reproducible. And we set 
> memory dump parameters, but there is no dump file generated when it occurs. 
> The line of code throwing the exception is below. 
> !image-2020-12-29-15-16-28-566.png|width=763,height=460!
>  
> More version information:
> maven-surefire-plugin: 3.0.0-M3
> testng: 6.4
> maven: 3.2.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to