Shelley Lynn Hughes-Godfrey created GEODE-6200:
--------------------------------------------------
Summary: CI: netstat --with-lsof fails with OOME (when netstat
command not found)
Key: GEODE-6200
URL: https://issues.apache.org/jira/browse/GEODE-6200
Project: Geode
Issue Type: Bug
Components: gfsh
Reporter: Shelley Lynn Hughes-Godfrey
org.apache.geode.management.internal.cli.NetstatDUnitTest >
testOutputToConsoleWithLsofForOneMember FAILED
{noformat}
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
org.apache.geode.management.internal.cli.NetstatDUnitTest >
testOutputToConsoleWithLsofForOneMember FAILED
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:649)
at java.lang.StringBuilder.append(StringBuilder.java:202)
at org.json.JSONStringer.string(JSONStringer.java:369)
at org.json.JSONStringer.value(JSONStringer.java:262)
at org.json.JSONArray.writeTo(JSONArray.java:732)
at org.json.JSONStringer.value(JSONStringer.java:231)
at org.json.JSONObject.writeTo(JSONObject.java:882)
at org.json.JSONStringer.value(JSONStringer.java:235)
at org.json.JSONObject.writeTo(JSONObject.java:882)
at org.json.JSONObject.toString(JSONObject.java:849)
at
org.apache.geode.management.internal.cli.json.GfJsonObject.toString(GfJsonObject.java:301)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at
org.apache.geode.management.internal.cli.result.CommandResult.toString(CommandResult.java:508)
at
org.apache.geode.management.internal.cli.NetstatDUnitTest.testOutputToConsoleWithLsofForOneMember(NetstatDUnitTest.java:104)
{noformat}
=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results Website
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
http://s3proxy.gemfire.pivotal.io/gemfire-test-results/9.5/distributedTest/1544666867/index.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
To download the test artifacts from this job, execute the following command
after the job has completed:
aws s3 cp
s3://gemfire-build-artifacts/9.5/9.5.3-build.2/1544666867/distributedtestfiles-9.5.3-build.2.tgz
.
This failure initially looks like GEODE-2488 ... which was fixed in March 2018.
GEODE-2488 marked the --with-lsof tests as @Ignore (tagged with this bug).
Later, the commit below added the following test
(testOutputToConsoleWithLsofForOneMember) ... so once again we are doing a
netstat --with-lsof which is producing a huge amount of output ... all read
into a single buffer for parsing which leads us to declare OOME. I don't think
this output is from a successful execution of the netstat command though -- the
test output shows the netstat command is not found (see below).
{noformat}
commit d2b263f9053f293a409c527d9c8b5ae17b745041
Author: Jens Deppe <[email protected]>
Date: Fri Jun 22 15:33:20 2018 -0700
GEODE-5335: Do not resolve addresses when calling netstat and lsof (#2070)
- This avoids long command pauses (or failures) if DNS is slow or
misconfigured.
- Add more netstat tests
(cherry picked from commit 908a5efe59c4a81be647bb82ba58a4ccba98e1ac)
{noformat}
{noformat}
+ public void testOutputToConsoleWithLsofForOneMember() throws Exception {
+ CommandResult result = gfsh.executeCommand("netstat --member=server-1
--with-lsof");
+ assertThat(result.getStatus()).isEqualTo(Result.Status.OK);
+
+ String rawOutput = result.getMessageFromContent();
+ String[] lines = rawOutput.split("\n");
+
+ assertThat(lines.length).isGreaterThan(5);
+
assertThat(lines[4].trim().split("[,\\s]+")).containsExactlyInAnyOrder("server-1");
+ assertThat(lines).filteredOn(e -> e.contains("## lsof output
##")).hasSize(1);
+ }
{noformat}
Interestingly, it looks like netstat fails here (from test output):
{noformat}
Command result for <netstat --member=server-1 --with-lsof>:
##########################################################
Host: ebc7313d51a3
OS: Linux 4.15.0-38-generic amd64
Member(s):
server-1
##########################################################
Could not execute "netstat". Reason: Cannot run program "netstat": error=2, No
such file or directory
{noformat}
The output seems to be a huge listing ... starting with this:
{noformat}
################ lsof output ###################
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 1 root cwd DIR 0,59 44 280305
/tmp/build/ae3c03f4/built-gemfire/test/geode/geode-core/build/distributedTest1562
java 1 root rtd DIR 0,102 80 234603 /
java 1 root txt REG 0,102 8464 161745
/usr/lib/jvm/java-8-oracle/jre/bin/java
java 1 root mem REG 0,67 161745
/usr/lib/jvm/java-8-oracle/jre/bin/java (path dev=0,102)
java 1 root mem REG 0,67 162079
/usr/lib/jvm/java-8-oracle/jre/lib/resources.jar (path dev=0,102)
java 1 root mem REG 0,67 161955
/usr/lib/jvm/java-8-oracle/jre/lib/ext/cldrdata.jar (path dev=0,102)
java 1 root mem REG 0,67 161959
/usr/lib/jvm/java-8-oracle/jre/lib/ext/localedata.jar (path dev=0,102)
java 1 root mem REG 0,67 161961
/usr/lib/jvm/java-8-oracle/jre/lib/ext/nashorn.jar (path dev=0,102)
java 1 root mem REG 0,67 161810
/usr/lib/jvm/java-8-oracle/jre/lib/amd64/libmanagement.so (path dev=0,102)
java 1 root mem REG 0,67 142155
/lib/x86_64-linux-gnu/libgcc_s.so.1 (path dev=0,102)
java 1 root mem REG 0,67 169374
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21 (path dev=0,102)
java 1 root mem REG 0,44 13472
/tmp/build/ae3c03f4/cache/gradle/native/25/linux-amd64/libnative-platform.so
(path dev=0,58)
{noformat}
Note that this is not new ... we see this 56 days ago (9.5.2 build 10):
http://concourse.gemfire.pivotal.io/teams/main/pipelines/gemfire-9.5/jobs/DistributedTest/builds/59
{noformat}
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
org.apache.geode.management.internal.cli.NetstatDUnitTest >
testOutputToConsoleWithLsofForOneMember FAILED
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:649)
at java.lang.StringBuilder.append(StringBuilder.java:202)
at java.util.AbstractCollection.toString(AbstractCollection.java:464)
at java.util.Vector.toString(Vector.java:1003)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at
org.apache.geode.management.internal.cli.result.CommandResult.toString(CommandResult.java:508)
at
org.apache.geode.management.internal.cli.NetstatDUnitTest.testOutputToConsoleWithLsofForOneMember(NetstatDUnitTest.java:104)
Heap dump file created [442340167 bytes in 1.462 secs]
{noformat}
logs show:
{noformat}
Command result for <netstat --with-lsof=true
--file=/tmp/junit1796957499625851049/junit2425143231094040391/command.log.txt>:
Saved netstat output in the file
/tmp/junit1796957499625851049/junit2425143231094040391/command.log.txt.
Command result for <netstat>:
########################################################
Host: 9aebab1d2525
OS: Linux 4.4.0-89-generic amd64
Member(s):
server-1, locator-0, server-2
########################################################
Could not execute "netstat". Reason: Cannot run program "netstat": error=2, No
such file or directory
{noformat}
If netstat isn't found ... are these tests even doing what they are supposed to?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)