[
https://issues.apache.org/jira/browse/GEODE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243444#comment-17243444
]
Bill Burcham edited comment on GEODE-8730 at 12/3/20, 7:53 PM:
---------------------------------------------------------------
>From the IDE I ran the docker Gradle task in geode-assembly to create a fresh
>Geode Docker image.
Then from
/Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni
I ran "docker-compose up" and from the Docker app dashboard I opened a shell
into the running ("geode") container.
Once in, I "apt-get update" and "apt-get install net-tools".
{noformat}
# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN
-
udp 0 0 127.0.0.11:42103 0.0.0.0:*
-
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name
Path
{noformat}
and then ran the gfsh startup script: "gfsh run
--file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains:
{noformat}
start locator --name=locator-maeve --connect=false --redirect-output
--hostname-for-clients=locator-maeve
--properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks
start server --name=server-dolores --group=group-dolores
--hostname-for-clients=server-dolores --locators=geode[10334]
--properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks
start server --name=server-clementine --group=group-clementine
--hostname-for-clients=server-clementine --server-port=40405
--locators=geode[10334] --properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks
connect --locator=geode[10334] --use-ssl=true
--security-properties-file=/geode/config/gfsecurity.properties
create region --name=region-dolores --group=group-dolores --type=REPLICATE
create region --name=region-clementine --group=group-clementine --type=REPLICATE
{noformat}
{noformat}
# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 geode:46867 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:43540 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:57053 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:55518 0.0.0.0:* LISTEN
515/java
tcp 0 0 geode:55486 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN
256/java
tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN
256/java
tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN
419/java
tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN
-
tcp 0 0 geode:48715 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN
256/java
udp 0 0 geode:41000 0.0.0.0:*
256/java
udp 0 0 geode:41001 0.0.0.0:*
419/java
udp 0 0 geode:41002 0.0.0.0:*
515/java
udp 0 0 127.0.0.11:42103 0.0.0.0:*
-
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name
Path
unix 2 [ ACC ] STREAM LISTENING 155159 256/java
/tmp/.java_pid256.tmp
unix 2 [ ACC ] STREAM LISTENING 158787 419/java
/tmp/.java_pid419.tmp
unix 2 [ ACC ] STREAM LISTENING 159071 515/java
/tmp/.java_pid515.tmp
{noformat}
Grouping these by PID: locator first, then cache servers:
{noformat}
tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN
256/java for locator clients
tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN
256/java for gfsh
tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN
256/java for browser (pulse)
tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:46867 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:57053 0.0.0.0:* LISTEN
256/java
udp 0 0 geode:41000 0.0.0.0:*
256/java for membership
tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN
419/java for client's cache
tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN
419/java
tcp 0 0 geode:55486 0.0.0.0:* LISTEN
419/java for health
tcp 0 0 geode:48715 0.0.0.0:* LISTEN
419/java for peer's cache
udp 0 0 geode:41001 0.0.0.0:*
419/java for membership
tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN
515/java for client's cache
tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN
515/java
tcp 0 0 geode:43540 0.0.0.0:* LISTEN
515/java for peer's cache
tcp 0 0 geode:55518 0.0.0.0:* LISTEN
515/java for health
udp 0 0 geode:41002 0.0.0.0:*
515/java for membership
{noformat}
I've highlighted with "***" the two bindings that are odd. These are ephemeral
ports but are not within the default (configured) port range 41000-61000. I
expect these are different each time we run and are the cause of this bug.
I searched the logs for those ports and didn't find them. I wonder what those
bindings are? A cache server binds these TCP ports:
* client's cache (40404, 40405 above)
* peer's cache ostensibly in port range (41000-61000)
* health monitoring also ostensibly in port range (41000-61000)
Of the three unknown TCP port bindings per cache server in the netstat output
above we only have categories for two (peer's cache, health monitoring.) What's
that third category?
In summary we have these two unexplained bindings (one per cache server) and we
have the one unexplained TCP binding before Geode even starts (see first
netstat above.)
A jstack (stack dump) showed that RMI is the culprit for those unexplained
cache server ports. "jstack 419 | less" showed:
{noformat}
"RMI TCP Accept-0" #26 daemon prio=9 os_prio=0 cpu=3.69ms elapsed=5311.36s
tid=0x00007fbe28005800 nid=0x1c5 runnable [0x00007fbe38862000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept([email protected]/Native
Method)
at
java.net.AbstractPlainSocketImpl.accept([email protected]/AbstractPlainSocketImpl.java:458)
at
java.net.ServerSocket.implAccept([email protected]/ServerSocket.java:565)
at
java.net.ServerSocket.accept([email protected]/ServerSocket.java:533)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept([email protected]/LocalRMIServerSocketFactory.java:52)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop([email protected]/TCPTransport.java:394)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run([email protected]/TCPTransport.java:366)
at java.lang.Thread.run([email protected]/Thread.java:834)
{noformat}
I'll see if there is a way to lock that down. And I'll see if/how that port
37221 that is bound before Geode starts, changes next time I spin up a
container.
was (Author: bburcham):
>From the IDE I ran the docker Gradle task in geode-assembly to create a fresh
>Geode Docker image.
Then from
/Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni
I ran "docker-compose up" and from the Docker app dashboard I opened a shell
into the running ("geode") container.
Once in, I "apt-get update" and "apt-get install net-tools".
{noformat}
# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN
-
udp 0 0 127.0.0.11:42103 0.0.0.0:*
-
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name
Path
{noformat}
and then ran the gfsh startup script: "gfsh run
--file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains:
{noformat}
start locator --name=locator-maeve --connect=false --redirect-output
--hostname-for-clients=locator-maeve
--properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks
start server --name=server-dolores --group=group-dolores
--hostname-for-clients=server-dolores --locators=geode[10334]
--properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks
start server --name=server-clementine --group=group-clementine
--hostname-for-clients=server-clementine --server-port=40405
--locators=geode[10334] --properties-file=/geode/config/gemfire.properties
--security-properties-file=/geode/config/gfsecurity.properties
--J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks
connect --locator=geode[10334] --use-ssl=true
--security-properties-file=/geode/config/gfsecurity.properties
create region --name=region-dolores --group=group-dolores --type=REPLICATE
create region --name=region-clementine --group=group-clementine --type=REPLICATE
{noformat}
{noformat}
# netstat -lp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 geode:46867 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:43540 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN
515/java
tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:57053 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:55518 0.0.0.0:* LISTEN
515/java
tcp 0 0 geode:55486 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN
256/java
tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN
256/java
tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN
419/java
tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN
-
tcp 0 0 geode:48715 0.0.0.0:* LISTEN
419/java
tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN
256/java
udp 0 0 geode:41000 0.0.0.0:*
256/java
udp 0 0 geode:41001 0.0.0.0:*
419/java
udp 0 0 geode:41002 0.0.0.0:*
515/java
udp 0 0 127.0.0.11:42103 0.0.0.0:*
-
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name
Path
unix 2 [ ACC ] STREAM LISTENING 155159 256/java
/tmp/.java_pid256.tmp
unix 2 [ ACC ] STREAM LISTENING 158787 419/java
/tmp/.java_pid419.tmp
unix 2 [ ACC ] STREAM LISTENING 159071 515/java
/tmp/.java_pid515.tmp
{noformat}
Grouping these by PID: locator first, then cache servers:
{noformat}
tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN
256/java for locator clients
tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN
256/java for gfsh
tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN
256/java for browser (pulse)
tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:46867 0.0.0.0:* LISTEN
256/java
tcp 0 0 geode:57053 0.0.0.0:* LISTEN
256/java
udp 0 0 geode:41000 0.0.0.0:*
256/java for membership
tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN
419/java for client's cache
tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN
419/java
tcp 0 0 geode:55486 0.0.0.0:* LISTEN
419/java for health
tcp 0 0 geode:48715 0.0.0.0:* LISTEN
419/java for peer's cache
udp 0 0 geode:41001 0.0.0.0:*
419/java for membership
tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN
515/java for client's cache
tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN
515/java
tcp 0 0 geode:43540 0.0.0.0:* LISTEN
515/java for peer's cache
tcp 0 0 geode:55518 0.0.0.0:* LISTEN
515/java for health
udp 0 0 geode:41002 0.0.0.0:*
515/java for membership
{noformat}
I've highlighted with "***" the two bindings that are odd. These are ephemeral
ports but are not within the default (configured) port range 41000-61000. I
expect these are different each time we run and are the cause of this bug.
I searched the logs for those ports and didn't find them. I wonder what those
bindings are? A cache server binds these TCP ports:
* client's cache (40404, 40405 above)
* peer's cache ostensibly in port range (41000-61000)
* health monitoring also ostensibly in port range (41000-61000)
Of the three unknown TCP port bindings per cache server in the netstat output
above we only have categories for two (peer's cache, health monitoring.) What's
that third category?
In summary we have these two unexplained bindings (one per cache server) and we
have the one unexplained TCP binding before Geode even starts (see first
netstat above.)
A jstack (stack dump) showed that RMI is the culprit for those unexplained
cache server ports. I'll see if there is a way to lock that down. And I'll see
if/how that port 37221 that is bound before Geode starts, changes next time I
spin up a container.
> CI failure: DualServerSNIAcceptanceTest fails to start server because port is
> in use
> ------------------------------------------------------------------------------------
>
> Key: GEODE-8730
> URL: https://issues.apache.org/jira/browse/GEODE-8730
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Darrel Schneider
> Assignee: Bill Burcham
> Priority: Major
>
> The run is here:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/AcceptanceTestOpenJDK8/builds/587]
> {noformat}
> org.apache.geode.client.sni.DualServerSNIAcceptanceTest > classMethod FAILED
> com.palantir.docker.compose.execution.DockerExecutionException:
> 'docker-compose exec -T geode gfsh run
> --file=/geode/scripts/geode-starter-2.gfsh' returned exit code 1
> The output was:
> 1. Executing - start locator --name=locator-maeve --connect=false
> --redirect-output --hostname-for-clients=locator-maeve
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=********
> --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks
> ...........................
> Locator in /locator-maeve on geode[10334] as locator-maeve is currently
> online.
> Process ID: 47
> Uptime: 16 seconds
> Geode Version: 1.14.0-build.0
> Java Version: 11.0.9.1
> Log File: /locator-maeve/locator-maeve.log
> JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties
> -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties
> -Dgemfire.enable-cluster-configuration=true
> -Dgemfire.load-cluster-configuration-from-dir=false
> -Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> -Dgemfire.OSProcess.DISABLE_REDIRECTION_CONFIGURATION=true
> Class-Path:
> /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar
> 2. Executing - start server --name=server-dolores --group=group-dolores
> --hostname-for-clients=server-dolores --locators=geode[10334]
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=********
> --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks
> .......
> Server in /server-dolores on geode[40404] as server-dolores is currently
> online.
> Process ID: 199
> Uptime: 5 seconds
> Geode Version: 1.14.0-build.0
> Java Version: 11.0.9.1
> Log File: /server-dolores/server-dolores.log
> JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties
> -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties
> -Dgemfire.start-dev-rest-api=false -Dgemfire.locators=geode[10334]
> -Dgemfire.use-cluster-configuration=true -Dgemfire.groups=group-dolores
> -Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks
> -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true
> -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
> Class-Path:
> /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar
> 3. Executing - start server --name=server-clementine
> --group=group-clementine --hostname-for-clients=server-clementine
> --server-port=40405 --locators=geode[10334]
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=********
> --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks
> ......The Cache Server process terminated unexpectedly with exit status
> 1. Please refer to the log file in /server-clementine for full details.
> Exception in thread "main" java.lang.RuntimeException: An IO error
> occurred while starting a Server in /server-clementine on geode[40405]:
> Network is unreachable; port (40405) is not available on localhost.
> at
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:852)
> at
> org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:737)
> at
> org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:256)
> Caused by: java.net.BindException: Network is unreachable; port (40405)
> is not available on localhost.
> at
> org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:142)
> at
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:794)
> ... 2 more
> ************************* Execution Summary ***********************
> Script file: /geode/scripts/geode-starter-2.gfsh
> Command-1 : start locator --name=locator-maeve --connect=false
> --redirect-output --hostname-for-clients=locator-maeve
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=/geode/config/gfsecurity.properties
> --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks
> Status : PASSED
> Command-2 : start server --name=server-dolores --group=group-dolores
> --hostname-for-clients=server-dolores --locators=geode[10334]
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=/geode/config/gfsecurity.properties
> --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks
> Status : PASSED
> Command-3 : start server --name=server-clementine
> --group=group-clementine --hostname-for-clients=server-clementine
> --server-port=40405 --locators=geode[10334]
> --properties-file=/geode/config/gemfire.properties
> --security-properties-file=/geode/config/gfsecurity.properties
> --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks
> Status : FAILED
> at
> com.palantir.docker.compose.execution.Command.lambda$throwingOnError$12(Command.java:60)
> at
> com.palantir.docker.compose.execution.Command.execute(Command.java:50)
> at
> com.palantir.docker.compose.execution.DefaultDockerCompose.exec(DefaultDockerCompose.java:122)
> at
> com.palantir.docker.compose.execution.DelegatingDockerCompose.exec(DelegatingDockerCompose.java:86)
> at
> com.palantir.docker.compose.execution.RetryingDockerCompose.exec(RetryingDockerCompose.java:22)
> at
> com.palantir.docker.compose.DockerComposeRule.exec(DockerComposeRule.java:171)
> at
> org.apache.geode.client.sni.DualServerSNIAcceptanceTest.beforeClass(DualServerSNIAcceptanceTest.java:77)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)