aweiri1 opened a new issue, #24819:
URL: https://github.com/apache/pulsar/issues/24819

   ### Search before reporting
   
   - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [x] I understand that [unsupported 
versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions)
 don't get bug fixes. I will attempt to reproduce the issue on a supported 
version of Pulsar client and Pulsar broker.
   
   
   ### User environment
   
   pulsar version: 4.04
   helm chart version: 4.0.1 
   running two kubernetes pulsar clusters: openshift cluster and talos cluster 
   
   Linux pulsar-talos-toolset-0 6.12.25-talos #1 SMP Mon Apr 28 10:05:42 UTC 
2025 x86_64 GNU/Linux
   Linux pulsar-okd1-toolset-0 6.3.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC 
Thu Jul  6 04:05:18 UTC 2023 x86_64 GNU/Linux
   
   using python client for test producer/consumer
   Python 3.10.12
   
   
   ### Issue Description
   
   I am trying to configure geo-replication for a two pulsar cluster set up. 
   
   I have an okd1 pulsar cluster running via an openshift kubernetes cluster, I 
have a talos pulsar cluster running via talos kubernetes cluster, and I have a 
global config/metadata store running as a zookeeper only pulsar cluster via 
talos kubernetes cluster. 
   
   each cluster uses a proxy service that has an external load balancer IP that 
I am using for configuration. 
   
   I have a two cluster geo-replication set up deployed via kubernetes. when 
running a producer on cluster A for the first time (using auto topic creation - 
did not manually create it) my producer successfully runs and sends the message 
on the cluster A service url.
   before starting the consumer on cluster B on that same topic, I wanted to 
check the topic stats on cluster B, but it says the topic doesn't exist. Once I 
start the consumer, the consumer on the cluster B service url, it receives the 
messages successfully. But I thought topics were supposed to be replicated 
across clusters? I have waited over 30 minutes to see if the topic shows up on 
cluster B, to eliminate any timing issues, but it still never showed up until 
that consumer is started on cluster B. 
   
   when I connect the cluster B consumer, it receives all of the messages from 
cluster A producer.
   until I start that cluster B consumer, the topic does not exist on cluster B 
and none of the messages exist on cluster B.
   
   this is only when the topic is first auto created from cluster A producer. 
Once that consumer runs for the first time on cluster B (which creates the 
topic for cluster B) I don't hit this issue on this topic again, since its 
already been created. From that point, the consumer on cluster B does not need 
to be running for me to see messages sitting in the backlog.
   
   ### Error messages
   
   ```text
   an error we get in the cluster A broker logs on immediate producer send is:
   
    2025-09-09T22:20:32,547+0000 [broker-client-shared-scheduled-executor-7-1] 
WARN  org.apache.pulsar.client.impl.PulsarClientImpl - [topic: 
persistent://geo-replication-2/testing/__change_events] Could not get 
connection while getPartitionedTopicMetadata -- Will try again in 754 ms        
                                                                                
                                                                                
                                    │
   │ pulsar-talos-broker 2025-09-09T22:20:32,551+0000 [pulsar-io-3-15] ERROR 
org.apache.pulsar.client.impl.ClientCnx - [id: 0x714b44aa, L:/ - R:] Close 
connection because received internal-server error 
{"errorMsg":"","reqId":1946041700241505712, 
"remote":"pulsar-okd1-broker.pulsar.svc.cluster.local/, "local":"/"}            
                                                                                
                    │
   │ pulsar-talos-broker 2025-09-09T22:20:32,552+0000 [pulsar-io-3-15] WARN  
org.apache.pulsar.client.impl.BinaryProtoLookupService - 
[persistent://geo-replication-2/testing/__change_events] failed to get 
Partitioned metadata : {"errorMsg":"{"errorMsg":"","reqId":1946041700241505712, 
"remote":"pulsar-okd1-broker.pulsar.svc.cluster.local/", 
"local":"/"}","reqId":1229027975051488438, "remote":"", "local":"/"}            
     │
   │ pulsar-talos-broker 
org.apache.pulsar.client.api.PulsarClientException$LookupException: 
{"errorMsg":"{"errorMsg":"","reqId":1946041700241505712, 
"remote":"pulsar-okd1-broker.pulsar.svc.cluster.local/1"local":"/"}","reqId":1229027975051488438,
 "remote":"", "local":"/"} 
                                                                                
                                                                         
   I assumed this could also be a timing issue because it immediately tries to 
find the topic on cluster B (okd1) and it does not exist.
   
   topic stats on cluster A show the following in the replication field: 
   
   "replication" : {
       "pulsar-okd1" : {
         "msgRateIn" : 0.0,
         "msgInCount" : 0,
         "msgThroughputIn" : 0.0,
         "bytesInCount" : 0,
         "msgRateOut" : 0.0,
         "msgOutCount" : 0,
         "msgThroughputOut" : 0.0,
         "bytesOutCount" : 0,
         "msgRateExpired" : 0.0,
         "replicationBacklog" : 100,
         "connected" : false,
         "replicationDelayInSeconds" : 0,
         "msgExpiredCount" : 0
       }
   
   I did enable debug on both clusters. there are no create topic logs, but the 
debug logs on okd1 show the metadata lookup for the topic created on talos 
cluster:
   
   
   2025-10-02T23:16:33,265+0000 [pulsar-io-3-5] DEBUG 
org.apache.pulsar.broker.service.BrokerService - No autoTopicCreateOverride 
policy found for persistent://geo-replication/testing/test
   2025-10-02T23:16:33,471+0000 [pulsar-io-3-8] DEBUG 
org.apache.pulsar.broker.service.ServerCnx - 
[persistent://geo-replication/testing/test] Received PartitionMetadataLookup 
from /10.128.2.45:43198 for 770175375804561621
   2025-10-02T23:16:33,471+0000 [pulsar-io-3-8] DEBUG
   ```
   
   ### Reproducing the issue
   
   using 3 kubernetes clusters. 1 of them is the zookeeper only cluster which 
is the global metadata store for the okd1 and talos clusters. The other two are 
full pulsar clusters which use a proxy. this is via kubernetes proxy service 
that uses a load balancer which has an external IP. that external IP is what I 
use as my pulsar service url
   On the talos cluster I run the following to enable geo-replication:
   
   bin/pulsar-admin tenants create geo-replication --allowed-clusters 
pulsar-okd1,pulsar-talos
   
   bin/pulsar-admin namespaces create geo-replication/testing
   
   bin/pulsar-admin namespaces set-clusters geo-replication/testing --clusters 
pulsar-talos,pulsar-okd1
   
   since we're using the global config store, the cluster and tenant already 
exists on okd1 cluster. all I did was set the clusters to the tenant/ns
   
    I did pass in the right service url for okd1 by doing a clusters update on 
the okd1 cluster and updated the urls to use the load balancer IP address. Then 
I restarted the brokers. (did this for both clusters).
   I am not using any authentication credentials for either of the clusters.
   I do have permissions to create a topic on that namespace. verified by just 
doing a topic create command on the ns.
   
   ### Additional information
   
   After more discussion with David K in slack channel, he concluded: 
   
   It sounds like the metadata on cluster B doesn’t get updated until a 
consumer attaches to the replicated topic even though the underlying topic data 
is there. This behavior is wrong, and the topic should exist in the target 
cluster’s metadata.
   
   
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to