Shelley Lynn Hughes-Godfrey created GEODE-6200: --------------------------------------------------
Summary: CI: netstat --with-lsof fails with OOME (when netstat command not found) Key: GEODE-6200 URL: https://issues.apache.org/jira/browse/GEODE-6200 Project: Geode Issue Type: Bug Components: gfsh Reporter: Shelley Lynn Hughes-Godfrey org.apache.geode.management.internal.cli.NetstatDUnitTest > testOutputToConsoleWithLsofForOneMember FAILED {noformat} java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid1.hprof ... org.apache.geode.management.internal.cli.NetstatDUnitTest > testOutputToConsoleWithLsofForOneMember FAILED java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:649) at java.lang.StringBuilder.append(StringBuilder.java:202) at org.json.JSONStringer.string(JSONStringer.java:369) at org.json.JSONStringer.value(JSONStringer.java:262) at org.json.JSONArray.writeTo(JSONArray.java:732) at org.json.JSONStringer.value(JSONStringer.java:231) at org.json.JSONObject.writeTo(JSONObject.java:882) at org.json.JSONStringer.value(JSONStringer.java:235) at org.json.JSONObject.writeTo(JSONObject.java:882) at org.json.JSONObject.toString(JSONObject.java:849) at org.apache.geode.management.internal.cli.json.GfJsonObject.toString(GfJsonObject.java:301) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.geode.management.internal.cli.result.CommandResult.toString(CommandResult.java:508) at org.apache.geode.management.internal.cli.NetstatDUnitTest.testOutputToConsoleWithLsofForOneMember(NetstatDUnitTest.java:104) {noformat} =-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results Website =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= http://s3proxy.gemfire.pivotal.io/gemfire-test-results/9.5/distributedTest/1544666867/index.html =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= To download the test artifacts from this job, execute the following command after the job has completed: aws s3 cp s3://gemfire-build-artifacts/9.5/9.5.3-build.2/1544666867/distributedtestfiles-9.5.3-build.2.tgz . This failure initially looks like GEODE-2488 ... which was fixed in March 2018. GEODE-2488 marked the --with-lsof tests as @Ignore (tagged with this bug). Later, the commit below added the following test (testOutputToConsoleWithLsofForOneMember) ... so once again we are doing a netstat --with-lsof which is producing a huge amount of output ... all read into a single buffer for parsing which leads us to declare OOME. I don't think this output is from a successful execution of the netstat command though -- the test output shows the netstat command is not found (see below). {noformat} commit d2b263f9053f293a409c527d9c8b5ae17b745041 Author: Jens Deppe <jde...@pivotal.io> Date: Fri Jun 22 15:33:20 2018 -0700 GEODE-5335: Do not resolve addresses when calling netstat and lsof (#2070) - This avoids long command pauses (or failures) if DNS is slow or misconfigured. - Add more netstat tests (cherry picked from commit 908a5efe59c4a81be647bb82ba58a4ccba98e1ac) {noformat} {noformat} + public void testOutputToConsoleWithLsofForOneMember() throws Exception { + CommandResult result = gfsh.executeCommand("netstat --member=server-1 --with-lsof"); + assertThat(result.getStatus()).isEqualTo(Result.Status.OK); + + String rawOutput = result.getMessageFromContent(); + String[] lines = rawOutput.split("\n"); + + assertThat(lines.length).isGreaterThan(5); + assertThat(lines[4].trim().split("[,\\s]+")).containsExactlyInAnyOrder("server-1"); + assertThat(lines).filteredOn(e -> e.contains("## lsof output ##")).hasSize(1); + } {noformat} Interestingly, it looks like netstat fails here (from test output): {noformat} Command result for <netstat --member=server-1 --with-lsof>: ########################################################## Host: ebc7313d51a3 OS: Linux 4.15.0-38-generic amd64 Member(s): server-1 ########################################################## Could not execute "netstat". Reason: Cannot run program "netstat": error=2, No such file or directory {noformat} The output seems to be a huge listing ... starting with this: {noformat} ################ lsof output ################### COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 1 root cwd DIR 0,59 44 280305 /tmp/build/ae3c03f4/built-gemfire/test/geode/geode-core/build/distributedTest1562 java 1 root rtd DIR 0,102 80 234603 / java 1 root txt REG 0,102 8464 161745 /usr/lib/jvm/java-8-oracle/jre/bin/java java 1 root mem REG 0,67 161745 /usr/lib/jvm/java-8-oracle/jre/bin/java (path dev=0,102) java 1 root mem REG 0,67 162079 /usr/lib/jvm/java-8-oracle/jre/lib/resources.jar (path dev=0,102) java 1 root mem REG 0,67 161955 /usr/lib/jvm/java-8-oracle/jre/lib/ext/cldrdata.jar (path dev=0,102) java 1 root mem REG 0,67 161959 /usr/lib/jvm/java-8-oracle/jre/lib/ext/localedata.jar (path dev=0,102) java 1 root mem REG 0,67 161961 /usr/lib/jvm/java-8-oracle/jre/lib/ext/nashorn.jar (path dev=0,102) java 1 root mem REG 0,67 161810 /usr/lib/jvm/java-8-oracle/jre/lib/amd64/libmanagement.so (path dev=0,102) java 1 root mem REG 0,67 142155 /lib/x86_64-linux-gnu/libgcc_s.so.1 (path dev=0,102) java 1 root mem REG 0,67 169374 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21 (path dev=0,102) java 1 root mem REG 0,44 13472 /tmp/build/ae3c03f4/cache/gradle/native/25/linux-amd64/libnative-platform.so (path dev=0,58) {noformat} Note that this is not new ... we see this 56 days ago (9.5.2 build 10): http://concourse.gemfire.pivotal.io/teams/main/pipelines/gemfire-9.5/jobs/DistributedTest/builds/59 {noformat} java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid1.hprof ... org.apache.geode.management.internal.cli.NetstatDUnitTest > testOutputToConsoleWithLsofForOneMember FAILED java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:649) at java.lang.StringBuilder.append(StringBuilder.java:202) at java.util.AbstractCollection.toString(AbstractCollection.java:464) at java.util.Vector.toString(Vector.java:1003) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.geode.management.internal.cli.result.CommandResult.toString(CommandResult.java:508) at org.apache.geode.management.internal.cli.NetstatDUnitTest.testOutputToConsoleWithLsofForOneMember(NetstatDUnitTest.java:104) Heap dump file created [442340167 bytes in 1.462 secs] {noformat} logs show: {noformat} Command result for <netstat --with-lsof=true --file=/tmp/junit1796957499625851049/junit2425143231094040391/command.log.txt>: Saved netstat output in the file /tmp/junit1796957499625851049/junit2425143231094040391/command.log.txt. Command result for <netstat>: ######################################################## Host: 9aebab1d2525 OS: Linux 4.4.0-89-generic amd64 Member(s): server-1, locator-0, server-2 ######################################################## Could not execute "netstat". Reason: Cannot run program "netstat": error=2, No such file or directory {noformat} If netstat isn't found ... are these tests even doing what they are supposed to? -- This message was sent by Atlassian JIRA (v7.6.3#76005)