Re: Ignite failing to start with linkerd?

2020-12-18 Thread jbmassicotte
Alex, 

Thank you for your detailed suggestion.  Ultimately I did not have to do
deep debugging. I consulted with the linkerd crew and they suggested a
linkerd config that restricted the linkerd encryption to outgoing port 8080,
that is the port used between our client app and the grid, leaving the grid
to k8s API connection unaltered. We are not seeing the mentioned failures,
and the grid startup is must faster.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite failing to start with linkerd?

2020-09-14 Thread jbmassicotte
Hello team,

We use linkerd (linkerd.io) to provide inter-pod SSL encryption in our Azure
Kubernetes cluster, as required by our organization. When we enabled linkerd
in our namespace, we observed that the ignite pods were crashing at startup,
then restarting, and succeeding in connecting with the grid at the 2nd
attempt.  Once connected, all is well.

We suspect the connection failure is related to
TcpDiscoveryKubernetesIpFinder, which is responsible for communicating with
the Kubernetes API, and retrieving the grid nodes IPs. With linkerd enabled,
all outbound traffic from a grid pod goes out via a linkerd proxy, then out
to the destination (the API in this case). Since linkerd is not enabled at
the destination, traffic should go out unaffected by the proxy. But
obviously, something is not quite right.

Here is a log from an impacted pod we were able to capture:

[2020-09-14 18:22:09,045][ERROR][main][IgniteKernal] Got exception while
starting (will rollback startup routine).
class org.apache.ignite.IgniteException: Unable to establish secure
connection. Was remote cluster configured with SSL?
[rmtAddr=/10.244.6.100:47500, errMsg="Remote host terminated the handshake"]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1487)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1220)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1032)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659)
at org.apache.ignite.Ignition.start(Ignition.java:346)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)

As an FYI, we do not use linkerd to encrypt grid node to grid node
connections; linkerd only encrypts HTTPS traffic. In our solution, linkerd
is used for the HTTP traffic between the frontend NGINX pods to the backend
ignite pods.

So my questions are:
* does anybody have experience using ignite with linkerd in a Kubernetes
cluster, and if so, have you observed this problem?
* what may cause the connection failure?
* what may be a fix to the above problem?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite failing to start with linkerd?

2020-09-14 Thread akorensh
Hi,
  The K8 IP finder does an equivalent of kubectl get endpoints 
  and then tries to discover the equivalent nodes based on the results.

  see:
https://github.com/apache/ignite/blob/513afe4dabbaa1c2853a76ff02e58f4a7db01076/modules/kubernetes/src/main/java/org/apache/ignite/spi/discovery/tcp/ipfinder/kubernetes/TcpDiscoveryKubernetesIpFinder.java#L139


   I would suggest debugging the relevant services to make sure that the
endpoints are correct from run to run -- and reflect relevant pods.

  The stack trace displayed shows that the actual communication message is
being intercepted and modified in some way.

   I would simplify the scenario to the bare minimum, one pod and one
external consumer, and then monitor all network traffic to see what happens
during the each connect.


  see:   https://apacheignite.readme.io/docs/ignite-service
   https://apacheignite.readme.io/docs/microsoft-azure-deployment

   Also take a look at the externalTrafficPolicy, to see whether it makes a
difference in your config,
   as K8 can mask the source IPs and in conjunction w/linkerd it might
affect your app.
 
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip


Thanks, Alex





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/