[ https://issues.apache.org/jira/browse/SUREFIRE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255829#comment-17255829 ]
Jingfei Hu edited comment on SUREFIRE-1808 at 12/29/20, 10:12 AM: ------------------------------------------------------------------ [~tibordigana] Thanks a lot for your reply, but i still think the call stack should provide me some trace to root cause. And we are finally able to get a dump of the parent process (vs. forked process) and locate where the heap is occupied. See below screenshot. !image-2020-12-29-17-14-42-884.png|width=585,height=239! We can see it's an object of org.apache.maven.plugin.surefire.booterclient.output.MultipleFailureException which is a subclass of IOException consumes a lot of memory. And we keep looking into what is inside of the object and find out the content is all the logs printed out by test execution in the form of thousands of RuntimeException objects residing the in the linked queue of MultipleFailureException. Below is the method _getLocalizedMessage_ of MultipleFailureException which I think contributes to OOM {code:java} @Override public String getLocalizedMessage() { StringBuilder messages = new StringBuilder(); for ( Throwable exception : exceptions ) { if ( messages.length() != 0 ) { messages.append( '\n' ); } String message = exception.getLocalizedMessage(); messages.append( message == null ? exception.toString() : message ); } return messages.toString(); } {code} Based on the dump, below is my detailed analysis with the sequence diagram. !image-2020-12-29-17-18-56-557.png! Followed by _ThreadedStreamConsumer#run_ method, !image-2020-12-29-17-19-51-660.png|width=606,height=370! Here is the key code snippets. _ThreadedStreamConsumer#close()_ {code:java} @Override public void close() throws IOException { if ( stop.compareAndSet( false, true ) ) { items.clear(); try { items.put( END_ITEM ); } catch ( InterruptedException e ) { currentThread().interrupt(); } } if ( pumper.hasErrors() ) { pumper.throwErrors(); } } {code} _ThreadedStreamConsumer#run()_ {code:java} @Override public void run() { while ( !ThreadedStreamConsumer.this.stop.get() ) { try { String item = ThreadedStreamConsumer.this.items.take(); if ( shouldStopQueueing( item ) ) { return; } target.consumeLine( item ); } catch ( Throwable t ) { errors.addException( t ); } } } {code} _ForkClient#consumeLine&processLine&createReportEntry_ {code:java} @Override public final void consumeLine( String s ) { if ( isNotBlank( s ) ) { processLine( s ); } } private void processLine( String event ) { final OperationalData op; try { op = new OperationalData( event ); } catch ( RuntimeException e ) { logStreamWarning( e, event ); return; } final String remaining = op.getData(); switch ( op.getOperationId() ) { case BOOTERCODE_TESTSET_STARTING: getTestSetReporter().testSetStarting( createReportEntry( remaining ) ); setCurrentStartTime(); break; case BOOTERCODE_TESTSET_COMPLETED: testsInProgress.clear(); getTestSetReporter().testSetCompleted( createReportEntry( remaining, testVmSystemProperties ) ); break; case BOOTERCODE_TEST_STARTING: ReportEntry reportEntry = createReportEntry( remaining ); testsInProgress.offer( reportEntry.getSourceName() ); getTestSetReporter().testStarting( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_SUCCEEDED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testSucceeded( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_FAILED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testFailed( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_SKIPPED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testSkipped( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_ERROR: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testError( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_ASSUMPTIONFAILURE: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testAssumptionFailure( createReportEntry( remaining ) ); break; case BOOTERCODE_SYSPROPS: int keyEnd = remaining.indexOf( "," ); StringBuilder key = new StringBuilder(); StringBuilder value = new StringBuilder(); unescapeString( key, remaining.substring( 0, keyEnd ) ); unescapeString( value, remaining.substring( keyEnd + 1 ) ); testVmSystemProperties.put( key.toString(), value.toString() ); break; ...other cases omitted... } } private TestSetReportEntry createReportEntry( String untokenized, Map<String, String> systemProperties ) { StringTokenizer tokens = new StringTokenizer( untokenized, "," ); try { String source = nullableCsv( tokens.nextToken() ); String name = nullableCsv( tokens.nextToken() ); String group = nullableCsv( tokens.nextToken() ); String message = nullableCsv( tokens.nextToken() ); String elapsedStr = tokens.nextToken(); Integer elapsed = "null".equals( elapsedStr ) ? null : decode( elapsedStr ); final StackTraceWriter stackTraceWriter = tokens.hasMoreTokens() ? deserializeStackTraceWriter( tokens ) : null; return reportEntry( source, name, group, stackTraceWriter, elapsed, message, systemProperties ); } catch ( RuntimeException e ) { throw new RuntimeException( untokenized, e ); } } {code} And according to the dump, one of the detailedMessage of RuntimeException incurring NoSuchElementException is below {code:java} 823 [main] INFO util.RetryUtil: 30176 milliseconds delay, attempt 5's result is null {code} Apparently, it doesn't contain _*source*_, *_name_*, *_group_* and _*message*_ and possible stack trace separated by a comma(,) like createReportEntry expects, thus causes NoSuchElementException. And the above message is just one piece of log entry during test run, it's barely a test result. So my question becomes # What should the _item_ object feeding to method _processLine or consumeLine_ look like? Is it the whole test output of forked process? Is it case-level or class-level? # Are there any documentation of inter-process communication between the main process and the forked process? I deeply doubt this issue would be resolved more quickly if I understand the mechanism. # How to resolve this issue anyway? Upgrading maven-surefire-plugin is not an option right now due to corporate policy Appreciate for any suggestions. was (Author: jingfei): [~tibordigana] Thanks a lot for your reply, but i still think the call stack should provide me some trace to root cause. And we are finally able to get a dump of the parent process (vs. forked process) and locate where the heap is occupied. See below screenshot. !image-2020-12-29-17-14-42-884.png|width=585,height=239! We can see it's an object of org.apache.maven.plugin.surefire.booterclient.output.MultipleFailureException which is a subclass of IOException consumes a lot of memory. And we keep looking into what is inside of the object and find out the content is all the logs printed out by test execution in the form of thousands of RuntimeException objects residing the in the linked queue of MultipleFailureException. Below is the method _getLocalizedMessage_ of MultipleFailureException which I think contributes to OOM {code:java} @Override public String getLocalizedMessage() { StringBuilder messages = new StringBuilder(); for ( Throwable exception : exceptions ) { if ( messages.length() != 0 ) { messages.append( '\n' ); } String message = exception.getLocalizedMessage(); messages.append( message == null ? exception.toString() : message ); } return messages.toString(); } {code} Based on the dump, below is my detailed analysis with the sequence diagram. !image-2020-12-29-17-18-56-557.png! Followed by _ThreadedStreamConsumer#run_ method, !image-2020-12-29-17-19-51-660.png|width=606,height=370! Here is the key code snippets. _ThreadedStreamConsumer#close()_ {code:java} @Override public void close() throws IOException { if ( stop.compareAndSet( false, true ) ) { items.clear(); try { items.put( END_ITEM ); } catch ( InterruptedException e ) { currentThread().interrupt(); } } if ( pumper.hasErrors() ) { pumper.throwErrors(); } } {code} _ThreadedStreamConsumer#run()_ {code:java} @Override public void run() { while ( !ThreadedStreamConsumer.this.stop.get() ) { try { String item = ThreadedStreamConsumer.this.items.take(); if ( shouldStopQueueing( item ) ) { return; } target.consumeLine( item ); } catch ( Throwable t ) { errors.addException( t ); } } } {code} _ForkClient#consumeLine&processLine&createReportEntry_ {code:java} @Override public final void consumeLine( String s ) { if ( isNotBlank( s ) ) { processLine( s ); } } private void processLine( String event ) { final OperationalData op; try { op = new OperationalData( event ); } catch ( RuntimeException e ) { logStreamWarning( e, event ); return; } final String remaining = op.getData(); switch ( op.getOperationId() ) { case BOOTERCODE_TESTSET_STARTING: getTestSetReporter().testSetStarting( createReportEntry( remaining ) ); setCurrentStartTime(); break; case BOOTERCODE_TESTSET_COMPLETED: testsInProgress.clear(); getTestSetReporter().testSetCompleted( createReportEntry( remaining, testVmSystemProperties ) ); break; case BOOTERCODE_TEST_STARTING: ReportEntry reportEntry = createReportEntry( remaining ); testsInProgress.offer( reportEntry.getSourceName() ); getTestSetReporter().testStarting( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_SUCCEEDED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testSucceeded( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_FAILED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testFailed( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_SKIPPED: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testSkipped( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_ERROR: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testError( createReportEntry( remaining ) ); break; case BOOTERCODE_TEST_ASSUMPTIONFAILURE: reportEntry = createReportEntry( remaining ); testsInProgress.remove( reportEntry.getSourceName() ); getTestSetReporter().testAssumptionFailure( createReportEntry( remaining ) ); break; case BOOTERCODE_SYSPROPS: int keyEnd = remaining.indexOf( "," ); StringBuilder key = new StringBuilder(); StringBuilder value = new StringBuilder(); unescapeString( key, remaining.substring( 0, keyEnd ) ); unescapeString( value, remaining.substring( keyEnd + 1 ) ); testVmSystemProperties.put( key.toString(), value.toString() ); break; ...other cases omitted... } } private TestSetReportEntry createReportEntry( String untokenized, Map<String, String> systemProperties ) { StringTokenizer tokens = new StringTokenizer( untokenized, "," ); try { String source = nullableCsv( tokens.nextToken() ); String name = nullableCsv( tokens.nextToken() ); String group = nullableCsv( tokens.nextToken() ); String message = nullableCsv( tokens.nextToken() ); String elapsedStr = tokens.nextToken(); Integer elapsed = "null".equals( elapsedStr ) ? null : decode( elapsedStr ); final StackTraceWriter stackTraceWriter = tokens.hasMoreTokens() ? deserializeStackTraceWriter( tokens ) : null; return reportEntry( source, name, group, stackTraceWriter, elapsed, message, systemProperties ); } catch ( RuntimeException e ) { throw new RuntimeException( untokenized, e ); } } {code} So my question becomes # Why would a simple StringTokenizer#nextToken throw NoSuchElementException? I can see no apparent reason to cause this? # How to workaround this issue? Is it related to the log format of my tests? Appreciate for any suggestions. > OutOfMemoryError running with TestNG > ------------------------------------ > > Key: SUREFIRE-1808 > URL: https://issues.apache.org/jira/browse/SUREFIRE-1808 > Project: Maven Surefire > Issue Type: Bug > Components: Maven Surefire Plugin, TestNG support > Affects Versions: 3.0.0-M3 > Environment: OS: Linux > Reporter: Jingfei Hu > Priority: Major > Labels: OOM, TestNG, surefire > Attachments: image-2020-12-29-13-31-31-539.png, > image-2020-12-29-13-58-44-076.png, image-2020-12-29-15-16-28-566.png, > image-2020-12-29-17-14-42-884.png, image-2020-12-29-17-18-56-557.png, > image-2020-12-29-17-19-51-660.png > > > Hi team, > We've been suffering OutOfMemory intermittently in our test labs recently. > The call stack is below > {noformat} > [ERROR] Java heap space -> [Help 1] > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) > at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at > org.apache.maven.plugin.surefire.booterclient.ForkStarter$CloseableCloser.run(ForkStarter.java:200) > at > org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.CommandLineUtils$1.call(CommandLineUtils.java:301) > at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:615) > at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:283) > at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1161) > at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1002) > at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:848) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:191) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229){noformat} > > Our test case count is +4k and still increasing, and pass rate is around 95% > for each execution. The OOM error is not always reproducible. And we set > memory dump parameters, but there is no dump file generated when it occurs. > The line of code throwing the exception is below. > !image-2020-12-29-15-16-28-566.png|width=763,height=460! > > More version information: > maven-surefire-plugin: 3.0.0-M3 > testng: 6.4 > maven: 3.2.5 -- This message was sent by Atlassian Jira (v8.3.4#803005)