[ https://issues.apache.org/jira/browse/IGNITE-11842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840101#comment-16840101 ]
Aquilino Viveiros edited comment on IGNITE-11842 at 5/15/19 6:56 AM: --------------------------------------------------------------------- There is definitely connectivity problems when using Ignite 2.7 (might affect other versions) on Kubernetes. Here are a few details: *Kubernetes version* {code:java} Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} {code} *Java Microservices (Clients, containers on K8s)* {code:java} // containers base image (java jre with musl) java:8-jre-alpine {code} {code:java} / # java -version openjdk version "1.8.0_111-internal" OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) {code} {code:java} / # ldd /usr/bin/java /lib/ld-musl-x86_64.so.1 (0x7f77e34f9000) Error loading shared library libjli.so: No such file or directory (needed by /usr/bin/java) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f77e34f9000) Error relocating /usr/bin/java: JLI_Launch: symbol not found {code} *Official Docker Image Ignite 2.7 (Server, containers on K8s)* {code:java} /opt/ignite # java -version openjdk version "1.8.0_181" OpenJDK Runtime Environment (IcedTea 3.9.0) (Alpine 8.181.13-r0) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) {code} {code:java} /opt/ignite # ldd /usr/bin/java /lib/ld-musl-x86_64.so.1 (0x7f80fe498000) Error loading shared library libjli.so: No such file or directory (needed by /usr/bin/java) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f80fe498000) Error relocating /usr/bin/java: JLI_Launch: symbol not found {code} With the above setup, the first 2-4 clients would connect fine, but after day they more clients we add, the clients would start to struggled to connect to the server. . Once we reach to around 10 clients connected to the server. Visor would work fine up to 2-4 clients, after that, visor would not connect at all. After a bit of reading we identified the problem might have been Alpine with musl. We changed all our Microservices to use another base image, Alpine with Libc. *Java Microservices (Clients, containers on K8s)* {code:java} // containers base image (java jre with libc) adoptopenjdk/openjdk8:alpine-slim{code} {code:java} / # java -version openjdk version "1.8.0_212" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {code} {code:java} / # ldd /opt/java/openjdk/bin/java /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libjli.so => /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so (0x7f4efbe46000) libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) Error relocating /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so: __rawmemchr: symbol not found {code} With the above, clients (~10 clients) would now connect to the server. But we here, visor would then again struggle to connect. We went a bit further an build a custom Ignite 2.7 Docker Image using adoptopenjdk/openjdk8:alpine-slim. Using [https://github.com/apache/ignite/tree/2.7.0/docker/apache-ignite] as the setup. *Custom Docker Image Ignite 2.7 (Server, containers on K8s)* {code:java} /opt/ignite # java -version openjdk version "1.8.0_212" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {code} {code:java} /opt/ignite # ldd /opt/java/openjdk/bin/java /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libjli.so => /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so (0x7fadc2f7c000) libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) Error relocating /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so: __rawmemchr: symbol not found {code} We this change, clients, server and visor showed no network/connectivity problems (scaled to 10 clients). was (Author: aveiros): There is definitely connectivity problems when using Ignite 2.7 (might affect other versions) on Kubernetes. Here are a few details: *Kubernetes version* {code:java} Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} {code} *Java Microservices (Clients, containers on K8s)* {code:java} // containers base image (java jre with musl) java:8-jre-alpine {code} {code:java} / # java -version openjdk version "1.8.0_111-internal" OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) {code} {code:java} / # ldd /usr/bin/java /lib/ld-musl-x86_64.so.1 (0x7f77e34f9000) Error loading shared library libjli.so: No such file or directory (needed by /usr/bin/java) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f77e34f9000) Error relocating /usr/bin/java: JLI_Launch: symbol not found {code} *Official Docker Image Ignite 2.7 (Server, containers on K8s)* {code:java} /opt/ignite # java -version openjdk version "1.8.0_181" OpenJDK Runtime Environment (IcedTea 3.9.0) (Alpine 8.181.13-r0) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) {code} {code:java} /opt/ignite # ldd /usr/bin/java /lib/ld-musl-x86_64.so.1 (0x7f80fe498000) Error loading shared library libjli.so: No such file or directory (needed by /usr/bin/java) libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f80fe498000) Error relocating /usr/bin/java: JLI_Launch: symbol not found {code} With the above setup, the first 2-4 clients would connect fine, but after day they more clients we add, the clients would start to struggled to connect to the server. . Once we reach to around 10 clients connected to the server. Visor would work fine up to 2-4 clients, after that, visor would not connect at all. After a bit of reading we identified the problem might have been Alpine with musl. We changed all our Microservices to use another base image, Alpine with Libc. *Java Microservices (Clients, containers on K8s)* {code:java} // containers base image (java jre with libc) adoptopenjdk/openjdk8:alpine-slim{code} {code:java} / # java -version openjdk version "1.8.0_212" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {code} {code:java} / # ldd /opt/java/openjdk/bin/java /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libjli.so => /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so (0x7f4efbe46000) libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f4efc05d000) Error relocating /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so: __rawmemchr: symbol not found {code} With the above, clients (~10 clients) would now connect to the server. But we here, visor would then again struggle to connect. We went a bit further an build a custom Ignite 2.7 Docker Image using adoptopenjdk/openjdk8:alpine-slim. Using [https://github.com/apache/ignite/tree/2.7.0/docker/apache-ignite] as the setup. *Custom Docker Image Ignite 2.7 (Server, containers on K8s)* {code:java} /opt/ignite # java -version openjdk version "1.8.0_212" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode) {code} {code:java} /opt/ignite # ldd /opt/java/openjdk/bin/java /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libjli.so => /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so (0x7fadc2f7c000) libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fadc3193000) Error relocating /opt/java/openjdk/bin/../lib/amd64/jli/libjli.so: __rawmemchr: symbol not found {code} We this change, clients, server and visor showed no network/connectivity problems (scaled to 10 clients). > clients fails to connect > ------------------------ > > Key: IGNITE-11842 > URL: https://issues.apache.org/jira/browse/IGNITE-11842 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.7 > Environment: kubernetes > > Reporter: James > Priority: Major > > The main symptom is that clients are failing to connect to the ignite > cluster, with reported timeouts in the logs. > The main fact we have is this (from within the client within a kubernetes > container on Linux): > / # netstat -ntp > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > PID/Program name > tcp 215796 0 ::ffff:10.42.2.97:43666 ::ffff:10.42.3.170:47500 > ESTABLISHED 13/java > > Namely, the application is failing to read data from the tcp socket. Notice > the “Recv-Q” of 215796. > > This could be an client application, but also the same thing happens with > ignitevisor.sh > Downgrading to Apache Ignite 2.3 resolves the problem. > Testes so far: > 2.7 intermittently fails to connect to the ignite cluster. > 2.3 seems OK. > 2.6 also fails after a number of clients have connected successfully. > > Has anyone else seen this? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)