lipeining opened a new issue #181:
URL: https://github.com/apache/pulsar-client-node/issues/181
os: Ubuntu 16.04.7 LTS
nodejs: v14.17.0
pulsar-client: 1.3.0
pulsar c++ client: 2.7.1
framework: midway.js
on our web server written with nodejs, when the server start, it will use
pulsar-client(we clone a copy of branch-1.3) connect to our pulsar service and
subscribe from some topics. but, with some bug, the server crash down after
subscribe on a pulsar topic.would you please take sometime to find the crash
reason, thanks a lot.
our web server subscribe on five or six topics, also, it had been work well
done for a long time. but, without change a line of js code, it crash down.
we spend some time to track what's happening.
when the node.js process crash down, we got the crash process core dump file,
here we got some log about why pulsar-client crash down.
### gcore
`gcore -c corefileName`
with help of gcore and core dump file, we get some crash stack like this
```txt
thread #1: tid = 40071, 0x00007fffdfcd4af9
Pulsar.node`MessageListenerProxy(Napi::Env, Napi::Function,
MessageListenerProxyData*) + 73, name = 'node', stop reason = signal SIGSEGV:
invalid address (fault address: 0x10)
frame #0: 0x00007fffdfcd4af9 Pulsar.node`MessageListenerProxy(Napi::Env,
Napi::Function, MessageListenerProxyData*) + 73
* frame #1: 0x00007fffdfcd5a37 Pulsar.node`std::_Function_handler<void
(Napi::Env, Napi::Function), napi_status
Napi::ThreadSafeFunction::BlockingCall<MessageListenerProxyData, void
(*)(Napi::Env, Napi::Function,
MessageListenerProxyData*)>(MessageListenerProxyData*, void (*)(Napi::Env,
Napi::Function, MessageListenerProxyData*)) const::'lambda'(Napi::Env,
Napi::Function)>::_M_invoke(std::_Any_data const&, Napi::Env&&,
Napi::Function&&) + 23
frame #2: 0x00007fffdfcbfbf2
Pulsar.node`Napi::ThreadSafeFunction::CallJS(napi_env__*, napi_value__*, void*,
void*) + 130
frame #3: 0x00000000009f3ab8 node`v8impl::(anonymous
namespace)::ThreadSafeFunction::IdleCb(uv_idle_s*) + 328
frame #4: node`uv__run_idle(loop=0x00000000044b0800) at loop-watcher.c:68
frame #5: node`uv_run(loop=0x00000000044b0800, mode=UV_RUN_DEFAULT) at
core.c:378
frame #6: 0x0000000000a625f4 node`node::NodeMainInstance::Run() + 580
frame #7: 0x00000000009f0ba5 node`node::Start(int, char**) + 277
frame #8: libc.so.6`__libc_start_main(main=(node`main), argc=3,
argv=0x00007fffffff6778, init=<unavailable>, fini=<unavailable>,
rtld_fini=<unavailable>, stack_end=0x00007fffffff6768) at libc-start.c:291
frame #9: 0x000000000098080c node`_start + 41
```
beacuse we copy a source code of [branch-1.3],so we add some log to find out
why it would crash on function **MessageListenerProxy**, we add some log on
this function.
```c++
void MessageListenerProxy(Napi::Env env, Napi::Function jsCallback,
MessageListenerProxyData *data) {
printf("-->%s:%d:%p\n", __func__, __LINE__, data);
printf("-->%s:%d:%p\n", __func__, __LINE__, data->cMessage);
printf("-->%s:%d:%p\n", __func__, __LINE__, data->consumer);
if (data->consumer == NULL) {
printf("--consumer is nil-->%s:%d:%p\n", __func__, __LINE__,
data->consumer);
return;
}
Napi::Object msg = Message::NewInstance({}, data->cMessage);
Consumer *consumer = data->consumer;
delete data;
jsCallback.Call({msg, consumer->Value()});
}
```
just find out the data->consumer is a nullptry, so the consumer->Value()
would crash down with log
invalid address (fault address: 0x10)

### possible crash reason
after analyse pulsar-client source code, we find out the data->consumer is
init inside
ConsumerNewInstanceWorker::OnOK----->>>>>Consumer::SetListenerCallback
https://github.com/apache/pulsar-client-node/blob/branch-1.3/src/Consumer.cc#L89
the MessageListener.prams.ctx is a ListenerCallback,
the ctx is also alias ConsumerConfig->listener, which will pass to a
Consumer->listener,
and more, Consumer::SetListenerCallback will make sure
ListenerCallback->consumer will be properly inited.
so far by now,we can assume, why data->consumer is nullptry? because it had
not been init.
ConsumerNewInstanceWorker init the data->consumer on another thread, and
MessageListener use data->consumer in main thread.
is it possible the data->consumer is not yet init before the MessageListener
use it?
because this two thread can not make sure the init order.
### here is the Code
```js
const Pulsar = require('@yy/sl-pulsar-client');
const subscription = 'test-sub-fork5';
export default agent => {
console.log('========== plugin agent ready ================');
const clients = {};
// just use fake [pulsarUrl, topicName, funcName]
const Clients = {
clientNameTest: {
ClientConfig: {
serviceUrl: 'pulsarUrl',
},
ConsumerConfigs: [
{
topic: 'topicName',
HandleMethod: 'funcName',
},
],
},
};
async function connect() {
for (const clientName of Object.keys(Clients)) {
const clientConfig = Clients[clientName];
const client = new Pulsar.Client(clientConfig.ClientConfig);
clients[clientName] = client;
await connectClient(clientName, client);
}
}
async function connectClient(clientName, client) {
const defaultConfig = {
subscriptionType: 'Shared',
subscription,
};
const clientConfig = Clients[clientName];
for (const config of clientConfig.ConsumerConfigs) {
const consumerConfig = { ...defaultConfig, ...config };
const { topic, subscription, HandleMethod } = consumerConfig;
try {
await client.subscribe({
...consumerConfig,
listener: (msg, res) => {
const msgData = msg.getData().toString();
const payload = parseMessagePayload({}, msgData);
// fs.appendFileSync('./msg', JSON.stringify({ topic, msgData
}));
console.log(
'[egg-pulsar] 接受到消息 topic: %s, subscription: %s, handleMethod:
%s, msg: %s',
topic,
subscription,
HandleMethod,
msgData,
payload
);
res.acknowledge(msg);
},
});
console.log(
'[egg-pulsar] consumer 订阅成功 topic: %s, subscription: %s',
topic,
subscription
);
} catch (e) {
console.log('e==>', e);
}
}
}
connect();
agent.messenger.on('egg-ready', () => {
console.log('plugin egg-ready!!!!!!!!!');
});
};
function parseMessagePayload(agent, payload) {
try {
// 校验消息格式
return JSON.parse(payload);
} catch (e) {
return console.log('[egg-pulsar] payload 消息格式非法', payload.toString());
}
}
```
### Pulsar Log
```txt
2021-11-27 21:35:59.044 INFO [139917541496576] Client:88 | Subscribing on
Topic :persistent:fakeTopicName
2021-11-27 21:35:59.045 INFO [139917541496576] ConnectionPool:85 | Created
connection for pulsar://fakePulsarUrl
2021-11-27 21:35:59.045 INFO [139917502453504] ClientConnection:356 |
[10.86.46.192:40820 -> 10.86.5.121:8501] Connected to broker
// /..
// just crash down
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]