On Thu, Nov 21, 2024 at 02:56:17PM +0100, Remi Pommarel wrote:
> On Thu, Nov 21, 2024 at 02:05:58PM +0100, Antonio Quartulli wrote:
> > On 20/11/2024 18:47, Remi Pommarel wrote:
> > > The number of TT changes can be less than initially expected in
> > > batadv_tt_tvlv_container_update() (changes can be removed by
> > > batadv_tt_local_event() in ADD+DEL sequence between reading
> > > tt_diff_entries_num and actually iterating the change list under lock).
> > >
> > > Thus tt_diff_len could be bigger than the actual changes size that need
> > > to be sent. Because batadv_send_my_tt_response sends the whole
> > > packet, uninitialized data can be interpreted as TT changes on other
> > > nodes leading to weird TT global entries on those nodes such as:
> > >
> > > * 00:00:00:00:00:00 -1 [....] ( 0) 88:12:4e:ad:7e:ba (179)
> > > (0x45845380)
> > > * 00:00:00:00:78:79 4092 [.W..] ( 0) 88:12:4e:ad:7e:3c (145)
> > > (0x8ebadb8b)
> > >
> > > All of the above also applies to OGM tvlv container buffer's tvlv_len.
> > >
> > > Remove the extra allocated space to avoid sending uninitialized TT
> > > changes in batadv_send_my_tt_response() and batadv_v_ogm_send_softif().
> > >
> > > Fixes: e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within
> > > OGMs")
> > > Signed-off-by: Remi Pommarel <[email protected]>
> > > ---
> > > net/batman-adv/translation-table.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/net/batman-adv/translation-table.c
> > > b/net/batman-adv/translation-table.c
> > > index 2243cec18ecc..f0590f9bc2b1 100644
> > > --- a/net/batman-adv/translation-table.c
> > > +++ b/net/batman-adv/translation-table.c
> > > @@ -990,6 +990,7 @@ static void batadv_tt_tvlv_container_update(struct
> > > batadv_priv *bat_priv)
> > > int tt_diff_len, tt_change_len = 0;
> > > int tt_diff_entries_num = 0;
> > > int tt_diff_entries_count = 0;
> > > + size_t tt_extra_len = 0;
> > > u16 tvlv_len;
> > > tt_diff_entries_num = atomic_read(&bat_priv->tt.local_changes);
> > > @@ -1027,6 +1028,9 @@ static void batadv_tt_tvlv_container_update(struct
> > > batadv_priv *bat_priv)
> > > }
> > > spin_unlock_bh(&bat_priv->tt.changes_list_lock);
> >
> > what speaks against acquiring tt.changes_list_lock before reading
> > tt.local_changes? (and making sure the writer does the update under lock
> > too) Any reason for not pursuing that path?
>
> Nothing against, just tried to follow old behavior in case this was
> that way for performance reasons. That would mean
> batadv_tt_local_commit_changes_nolock() to take the lock; because it
> is only called once per OGM interval I think that would be ok.
Actually I initialy though that holding this lock in
batadv_tt_local_commit_changes_nolock() would be fine, but because it
purges client and updates crc (two fairly intensive operations), that
could be a bad idea to hold the lock that long.
So batadv_tt_local_commit_changes_nolock() would still need to get
tt.local_changes without the lock (needing READ_ONCE), hence updates
would need WRITE_ONCE to avoid store tearing as discussed with Sven.
So the patch would be quite similar, only tt->tt.changes_list_lock will
be taken sooner in batadv_tt_tvlv_container_update().
That will fix the ADD between two read situation as you described
though.
Do you still prefer this option ?
Thanks
--
Remi