On Thu, Jun 03, 2021 at 11:19:32PM +0000, Maslanka, Pawel wrote:
> Hi BIRD team!
> 
> We found a case when BMP code is trying to connect with BMP collector service 
> with sk_open(), this causes increasing CPU utilization. To reproduce this 
> case, you have just:
> 
>   1.  Server machine where BMP PDU packets will be sent, should be reachable 
> (so it can be pinged).
>   2.  BMP collector service itself should not be running on this server.
>   3.  Run BIRD with enabled BMP protocol.
> 
> After that you should observe that BIRD process has significantly increased 
> CPU utilization. This is related somehow with “BIRD socket” because when I 
> capture network traffic on host machine (where BIRD is running), I can see 
> massive amount of TCP packets which are exchange between BIRD host machine 
> and BMP collector machine. At the moment socket type related with BMP 
> connection is SK_TCP_ACTIVE.
> Do you have any idea what is going wrong or how BIRD socket should be 
> properly use?

Hi

After failed attempt to connect() the socket err_hook is called. In such
case err_hook is called and you are supposed to close the socket and
either disable the protocol, or setup some timeout to restart connect
attempts. See rpki_err_hook() or bgp_sock_err(). Otherwise, BIRD socket
layer would try to connect() immediately again.

This part is missing from bmp_sock_err() in our bmp branch, i should
fix that. It is still WiP.

> I need also a tip if there is a way to get notification from BIRD
> socket if we lost connection with BMP collector service? One option is to

If a connection is closed regularly, then socket err_hook is called,
but with err=0.p_sock_err(). In most cases the handling would
be similar to an actual error (try to re-establish connection after
some timeout).

> Currently we have switched to BMP code provided on bmp branch from gitlab 
> BIRD repo.
> 
> Additionally I have a question referring to enclosed code. Can I free list 
> node and node data itself when sk_send() returns value greater or equal to 0 
> (>= 0), like in the below code?
> 
>   WALK_LIST_DELSAFE(tx_data, tx_data_next, p->tx_queue)
>   {
>     ...
>     rv = sk_send(p->sk, data_size);
>     if (rv < 0) {
>       return;
>     }
> 
>     mb_free(tx_data->data);
>     rem_node((node *) tx_data);
>     mb_free(tx_data);
>     if (rv == 0) {
>       return;
>     }
>     ...
> 
> Or I should to do that only if sk_send() return value greater than 0 (> 0) ? 
> My goal is sending all data from list if there was only "temporary" problem 
> with sk_send().

This looks OK. If sk_send() returns > 0, data were sent, you can free the
data and continue the loop. If sk_send() returns 0, data were not sent,
but they stay in sk->tbuf, so you can free the data from your tx_queue,
and break the loop and wait for tx_hook to happen again.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."

Reply via email to