aweiri1 commented on issue #24819:
URL: https://github.com/apache/pulsar/issues/24819#issuecomment-3560222940
@lhotari I am still working on resolving this issue, and I have documented
the following findings:
I believe I may have made progress on narrowing down the issue, which is
cross communication across clusters for replicators.
As we know, geo-replication is semi-working:
**issue re-cap:**
- 2 pulsar kubernetes cluster set-up (okd1 and talos)
- updated cluster urls to load balancer IPs (manually) and restarted brokers
with success
- can use the cluster urls successfully with producers and consumers
- enabled geo-replication on replication tenant/ns
- produced 100 messages on talos cluster with no consumers running on either
cluster
- immediately triggers connection/internal-server error in talos broker logs
- topic never exists on okd1 cluster
- started consumer on okd1 cluster and it receives the 100 messages
**result:** geo-replication semi-working with weird topic replication issue.
The topic never replicates, even though on consumer start, the messages
replicate.
**theory:** by default and behind the scenes for geo-replication, the talos
replicator communicates with the okd1 broker NOT the LB/Proxy. Hence why the
error log shows the connection error happening with the internal okd1 broker
dns name. But we can’t do cross-cluster-communication with brokers because they
do not expose an external endpoint, only internal. It is trying to connect over
the internal pod network of okd, which won’t work. What we have been using is a
proxy service in kubernetes that sits behind a loadbalancer which has an
external IP, and this is the only external IP we are using in our kubernetes
cluster set up. I am trying to look into pulsar configurations that would
resolve this, but I am not sure if it is because of our kubernetes set up.
configurations in broker.conf that may help resolve the issue:
- advertisedAddress
- I changed advertisedAddress to our LB IP in broker.configData in our
helm chart, and it breaks the deployment, the pods aren't able to come up.
- advertisedListeners
- internalListenerName
- bindAddress
- createTopicToRemoteClusterForReplication
- this is set to true for our brokers, so this shouldn't be the issue.
Any feedback you may have is appreciated!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]