Re: [PATCH dpdk 0/2] hash: safe data replacement on overwrite

Stephen Hemminger Thu, 12 Feb 2026 14:56:06 -0800

On Thu, 12 Feb 2026 22:33:15 +0100
Robin Jarry <[email protected]> wrote:


> When rte_hash_add_key_data() is called with an existing key, the old
> data pointer is silently overwritten. With RCU-protected readers still
> potentially accessing the old data, the application cannot safely free
> it since the old pointer is never returned.
> 
> This series provides two ways to address the problem:
> 
> * Automatic: when RCU is configured with a free_key_data_func callback,
>   rte_hash_add_key_data() now automatically defers freeing the old data
>   on overwrite. No caller changes are needed.
> 
> * Explicit: a new rte_hash_replace_key_data() API returns the previous
>   data pointer via an output parameter, giving the caller full control
>   over the old pointer's lifecycle. No automatic deferred free is
>   performed in that case.
> 
> Robin Jarry (2):
>   hash: free replaced data on overwrite when RCU is configured
>   hash: add replace API returning old data on overwrite
> 
>  app/test/test_hash.c                   | 214 +++++++++++++++++++++++++
>  doc/guides/rel_notes/release_26_03.rst |  12 ++
>  lib/hash/rte_cuckoo_hash.c             | 146 +++++++++++++----
>  lib/hash/rte_hash.h                    |  75 ++++++++-
>  4 files changed, 417 insertions(+), 30 deletions(-)


Subtle change, so fed to AI and got.

Here's the deep dive review. The two most substantive findings:

1. Fragile overwrite/delete discriminator in
__hash_rcu_qsbr_free_resource — The dequeue callback uses old_data !=
NULL to distinguish overwrite entries from delete entries, but the
enqueue side already sets key_idx = 0 as a sentinel for overwrites.
Using key_idx == 0 would be more robust, because if an application
legitimately stores NULL data and overwrites the key, the current code
will fall through to the delete path and try to recycle key index 0
(the dummy slot).

2. Leaked old data on rte_rcu_qsbr_dq_enqueue failure — The enqueue
failure in __hash_rcu_auto_free_old_data logs an error but the old data
becomes permanently unreachable. This is consistent with the existing
delete-path pattern, but it's a real resource leak on error. The
overall design is solid — the split between automatic RCU free (patch
1) for existing users and explicit replace API (patch 2) for
fine-grained control is clean, and the interaction between the two is
correctly ordered in the overwrite: label.

Long review.

# DPDK Hash Library Patch Series Review

**Series**: hash: free replaced data on overwrite / add replace API
**Author**: Robin Jarry <[email protected]>
**Patches**: 2
**Date**: 2026-02-12

---

## Patch 1/2: hash: free replaced data on overwrite when RCU is configured

### Summary

When `rte_hash_add_key_data()` overwrites an existing key, the old data pointer 
was silently lost. With RCU-protected readers still potentially accessing the 
old data, applications had no safe way to free it. This patch adds automatic 
deferred freeing of the old data pointer via the RCU defer queue when a 
`free_key_data_func` callback is configured.

### Correctness Analysis

#### **Error — Potential use of NULL data treated as "no overwrite" (Medium 
confidence ~60%)**

In `__rte_hash_add_key_with_hash()`, the overwrite detection relies on:

```c
overwrite:
    if (saved_old_data != NULL)
        __hash_rcu_auto_free_old_data(h, saved_old_data);
    return ret;
```

If an application legitimately stores `NULL` as the data pointer for a key 
(which `rte_hash_add_key_data()` permits — `data` is `void *` with no 
restriction against NULL), and then overwrites that key with new data, 
`saved_old_data` will be `NULL` and the overwrite path will silently skip the 
auto-free. This is probably fine for the auto-free case (freeing NULL is a 
no-op anyway), but it means the code cannot distinguish "no overwrite happened" 
from "overwrite happened, old data was NULL." This becomes more significant in 
Patch 2/2 where `old_data` is exposed to the caller — see below.

#### **Error — `__hash_rcu_qsbr_free_resource` overwrite detection uses 
`old_data != NULL` sentinel (Medium-high confidence ~70%)**

In the dequeue callback:

```c
/* Overwrite case: free old data only, do not recycle slot */
if (rcu_dq_entry.old_data != NULL) {
    h->hash_rcu_cfg->free_key_data_func(
        h->hash_rcu_cfg->key_data_ptr,
        rcu_dq_entry.old_data);
    return;
}
```

This uses `old_data != NULL` to distinguish overwrite entries from delete 
entries. But overwrite entries are enqueued with `key_idx = 0` as the sentinel:

```c
struct __rte_hash_rcu_dq_entry rcu_dq_entry = {
    .key_idx = 0,
    .ext_bkt_idx = EMPTY_SLOT,
    .old_data = old_data_val,
};
```

The `key_idx == 0` check would be a more robust sentinel since key index 0 is 
the reserved dummy entry in the cuckoo hash. Using `old_data != NULL` means 
that if someone stores NULL data and then overwrites, the entry flows through 
to the delete path which will try to access `keys[0]` — the dummy slot. The 
consequences depend on what's in the dummy slot, but it's at minimum 
semantically wrong. **Recommendation**: check `key_idx == 0` instead of 
`old_data != NULL` as the overwrite discriminator:

```c
if (rcu_dq_entry.key_idx == 0) {
    /* Overwrite case */
    if (rcu_dq_entry.old_data != NULL)
        h->hash_rcu_cfg->free_key_data_func(...);
    return;
}
```

#### **Error — Missing error check on `rte_rcu_qsbr_dq_enqueue` (Low-medium 
confidence ~55%)**

In `__hash_rcu_auto_free_old_data()`:

```c
if (rte_rcu_qsbr_dq_enqueue(h->dq, &rcu_dq_entry) != 0)
    HASH_LOG(ERR, "Failed to push QSBR FIFO");
```

The enqueue failure is logged but the old data pointer is leaked. The overwrite 
has already happened (the data in the hash table has been updated), so by the 
time we reach this point, the old data is unreachable. If enqueue fails, the 
old data will never be freed. This is a resource leak on a (presumably rare) 
error path. The existing delete path in `__rte_hash_del_key_with_hash` has the 
same pattern, so this is consistent with existing code, but it's worth noting 
that a failed enqueue here means a permanent memory leak of the old data.

#### **Warning — Behavioral change in the ext-table duplicate-check path**

In the extended-table section of `__rte_hash_add_key_with_hash`, the original 
code jumped to `failure:` when a duplicate was found (which unlocked and 
returned). The new code jumps to `overwrite:` after explicitly unlocking:

```c
ret = search_and_update(h, data, key, prim_bkt, short_sig,
            &saved_old_data);
if (ret != -1) {
    enqueue_slot_back(h, cached_free_slots, slot_id);
    __hash_rw_writer_unlock(h);
    goto overwrite;
}
```

This is correct — the unlock happens before the goto, and `overwrite:` doesn't 
attempt to unlock. Just confirming this is safe. The change from `goto failure` 
to explicit unlock + `goto overwrite` is necessary and correct.

### Test Coverage

The test `test_hash_rcu_qsbr_replace_auto_free` is well-structured:
- Creates hash with RW_CONCURRENCY_LF and RCU QSBR in DQ mode
- Registers a pseudo reader
- Tests overwrite + reclaim, then delete + reclaim
- Verifies the callback receives the correct old data pointers
- Proper cleanup in the `end:` label

The cleanup path correctly handles the case where `rcu.v` might be NULL 
(skipping thread offline/unregister). `rte_hash_free(NULL)` and 
`rte_free(NULL)` are both safe.

### Documentation

Release notes entry is appropriate and well-written. The API doc updates to 
`rte_hash.h` clearly document the new behavior.

---

## Patch 2/2: hash: add replace API returning old data on overwrite

### Summary

Adds `rte_hash_replace_key_data()` and `rte_hash_replace_key_with_hash_data()` 
— variants of the add APIs that return the previous data pointer via an output 
parameter. When the caller provides `old_data`, no automatic RCU deferred free 
is performed; the caller takes ownership.

### Correctness Analysis

#### **Error — `saved_old_data` initialized to NULL but not reset between paths 
(Medium confidence ~65%)**

In `__rte_hash_add_key_with_hash`, `saved_old_data` is initialized to `NULL` at 
declaration. The function has multiple `search_and_update` calls, each of which 
may or may not set `saved_old_data`. If a first `search_and_update` call 
doesn't find the key (returns -1), `saved_old_data` stays NULL. A later call 
that does find the key will set it. This is correct. However, consider the path 
where `rte_hash_cuckoo_insert_mw` returns 1 (key found during insert):

```c
ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
                short_sig, slot_id, &ret_val,
                &saved_old_data);
if (ret == 0)
    return slot_id - 1;
else if (ret == 1) {
    enqueue_slot_back(h, cached_free_slots, slot_id);
    ret = ret_val;
    goto overwrite;
}
```

On the `ret == 0` path (fresh insert, no overwrite), the function returns 
directly without going through `overwrite:`. So `old_data` is never set for 
fresh inserts via this path. But `rte_hash_replace_key_with_hash_data` 
initializes `*old_data = NULL` before the call, so this is safe — the caller 
sees NULL for fresh inserts. This is correct.

#### **Warning — New experimental APIs missing `__rte_experimental` annotation 
validation**

The new functions `rte_hash_replace_key_data()` and 
`rte_hash_replace_key_with_hash_data()` are correctly marked 
`__rte_experimental` in the header and use `RTE_EXPORT_EXPERIMENTAL_SYMBOL` in 
the `.c` file. This follows DPDK conventions.

#### **Info — API design: `old_data` is required (non-NULL) for replace 
functions**

`rte_hash_replace_key_with_hash_data` validates `old_data != NULL`:

```c
RETURN_IF_TRUE(((h == NULL) || (key == NULL) ||
        (old_data == NULL)), -EINVAL);
```

This is a reasonable design choice — if you don't want old_data, use the 
existing `rte_hash_add_key_data()` instead. The semantics are clean: replace 
always tells you what was there before.

### Test Coverage

`test_hash_replace_key_data` covers:
- Overwrite path: verifies `old_data` returns the previous value
- Lookup after replace: verifies the new data is stored correctly
- Fresh insert via replace: verifies `old_data` is set to NULL

The test intentionally sets `old_data = (void *)0xdeadbeef` before the 
fresh-insert replace to verify it gets zeroed. Good practice.

**Note**: The test doesn't cover the `rte_hash_replace_key_with_hash_data()` 
variant (with precomputed hash). Since it's a thin wrapper that delegates to 
the same internal function, the coverage is adequate, but a dedicated test for 
the hash variant would be thorough.

### Documentation

Release notes and API documentation are comprehensive and clear. The doxygen 
comments properly document the ownership semantics (caller owns old pointer, no 
auto-RCU-free).

---

## Cross-Patch Observations

### Overall Architecture Assessment

The design is clean and well-thought-out:

1. **Patch 1** adds automatic RCU deferred free as a "zero-effort" improvement 
for existing users who already have `free_key_data_func` configured.
2. **Patch 2** adds an explicit replace API for users who need precise control 
over the old data's lifecycle.
3. The two features interact correctly: when `old_data` is provided (replace 
API), automatic RCU free is suppressed; when `old_data` is NULL (existing add 
API), automatic RCU free kicks in.

The priority of the `overwrite:` label logic in the final version is:

```c
overwrite:
    if (old_data != NULL)       // caller wants ownership (replace API)
        *old_data = saved_old_data;
    else if (saved_old_data != NULL)  // auto-free via RCU (add API)
        __hash_rcu_auto_free_old_data(h, saved_old_data);
    return ret;
```

This is the correct priority ordering.

### Key Finding Summary

| # | Severity | Patch | Finding |
|---|----------|-------|---------|
| 1 | Error | 1/2 | `old_data != NULL` sentinel in 
`__hash_rcu_qsbr_free_resource` is fragile; use `key_idx == 0` instead to 
discriminate overwrite vs delete entries |
| 2 | Error | 1/2 | Failed `rte_rcu_qsbr_dq_enqueue` leaks old data (consistent 
with existing code pattern but worth noting) |
| 3 | Info | 2/2 | No test for `rte_hash_replace_key_with_hash_data()` variant 
specifically |

Re: [PATCH dpdk 0/2] hash: safe data replacement on overwrite

Reply via email to