On 21/11/2024 21:24, Remi Pommarel wrote:
On Thu, Nov 21, 2024 at 07:02:43PM +0100, Sven Eckelmann wrote:
On Thursday, 21 November 2024 16:07:24 CET Remi Pommarel wrote:
So the patch would be quite similar, only tt->tt.changes_list_lock will
be taken sooner in batadv_tt_tvlv_container_update().

That will fix the ADD between two read situation as you described
though.

Do you still prefer this option ?

I can't speak for Antonio but I think I would prefer for the fix the current
version. The locking one would end up again with nested spinlocks and maybe
more refactoring. And it might be nicer for the stable backports to have less
noise in the patch.

I tend to second that, because if I understand correctly, even if
tt.changes_list_lock is held sooner in batadv_tt_tvlv_container_update()
the scenario Antonio described could still happen. Indeed if the ADD (or
even DEL for that matter) happens between VLAN table CRC computation
(batadv_tt_local_update_crc()) and the lock, neighbor will have CRC
mismatch and send TT_REQUEST anyway. The race window would be smaller
but still here.

So if I am not mistaken, the only solution to eliminate the race
completely would be to compute CRC while holding the lock, and this I
don't really like.

I agree with Sven, and I'd not jump into more refactoring.
Better to keep the bugfix as is.

Cheers,



Btw. just noticed that the code (not in your change - but overall) for the
filling of diff entries could have been something like:

--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -980,6 +980,7 @@ static void batadv_tt_tvlv_container_update(struct
batadv_priv *bat_priv)
        int tt_diff_entries_count = 0;
        bool drop_changes = false;
        size_t tt_extra_len = 0;
+       LIST_HEAD(tt_diffs);
        u16 tvlv_len;
tt_diff_entries_num = READ_ONCE(bat_priv->tt.local_changes);
@@ -1011,9 +1012,10 @@ static void batadv_tt_tvlv_container_update(struct
batadv_priv *bat_priv)
spin_lock_bh(&bat_priv->tt.changes_list_lock);
        bat_priv->tt.local_changes = 0;
+       list_splice_init(&bat_priv->tt.changes_list, &tt_diffs);
+       spin_unlock_bh(&bat_priv->tt.changes_list_lock);
- list_for_each_entry_safe(entry, safe, &bat_priv->tt.changes_list,
-                                list) {
+       list_for_each_entry_safe(entry, safe, &tt_diffs, list) {
                if (tt_diff_entries_count < tt_diff_entries_num) {
                        memcpy(tt_change + tt_diff_entries_count,
                               &entry->change,
@@ -1023,7 +1025,6 @@ static void batadv_tt_tvlv_container_update(struct
batadv_priv *bat_priv)
                list_del(&entry->list);
                kmem_cache_free(batadv_tt_change_cache, entry);
        }
-       spin_unlock_bh(&bat_priv->tt.changes_list_lock);
tt_extra_len = batadv_tt_len(tt_diff_entries_num -
                                     tt_diff_entries_count);


And then you can also move this before "tt_diff_entries_num = ..." and
save the corresponding bat_priv->tt.local_changes for the spliced list to the
inside the lock also in a local variable. And then operate on this variable
for the other decisions. Of course, you must still clean the local list in
case of an error. Which of course would slightly change the behavior in case
of an allocation error in batadv_tt_prepare_tvlv_local_data (which would
previously kept the list as it was).

But if it would be done like this then we could also remove the READ_ONCE and
not introduce the WRITE_ONCE - just because local_changes is only touched
inside a locked area (see changes_list_lock).

Please double check these statements - this was just a simple brain dump.

Yes that would be a much more elegant way to handle it. Unfortunately,
if I don't miss anything, the WRITE_ONCE/READ_ONCE would still be
needed because batadv_tt_local_commit_changes_nolock() has to load
tt.local_changes out of the lock to check if it needs to purge client
and recompute CRCs.

Thanks,


--
Antonio Quartulli

Reply via email to