https://bugs.kde.org/show_bug.cgi?id=417377

            Bug ID: 417377
           Summary: False-positive "uninitialised value(s)" reports from
                    memcheck
           Product: valgrind
           Version: unspecified
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: memcheck
          Assignee: jsew...@acm.org
          Reporter: l...@yuriev.ru
  Target Milestone: ---

SUMMARY

When dealing with memcheck messages about reading uninitialized data in some
project, I came to the conclusion that these are false-positives. I would like
to understand with the problem. Below I described my actions and findings.

Basically I see the reports like this:
==21788== Conditional jump or move depends on uninitialised value(s)
==21788==    at 0x4E561A7: mdbx_pnl_check (core.c:2733)
==21788==    by 0x4E561A7: mdbx_page_alloc.isra.61 (core.c:5073)
==21788==    by 0x4E5742A: mdbx_page_touch (core.c:5433)
==21788==    by 0x4E57F8F: mdbx_page_search (core.c:10794)
==21788==    by 0x4E5E214: mdbx_cursor_touch (core.c:11656)
==21788==    by 0x4E623F2: mdbx_cursor_put (core.c:11865)
==21788==    by 0x4E63C39: mdbx_put (core.c:14879)
==21788==    by 0x11A068: testcase::insert(std::unique_ptr<keygen::result,
keygen::buffer_deleter> const&, std::unique_ptr<keygen::result,
keygen::buffer_deleter> const&, unsigned int) (test.cc:617)
==21788==    by 0x115358: testcase_nested::grow_head(unsigned int)
(nested.cc:195)
==21788==    by 0x1157D9: testcase_nested::run() (nested.cc:272)
==21788==    by 0x1190A1: test_execute(actor_config const&) (test.cc:585)
==21788==    by 0x10F18D: osal_actor_start(actor_config const&, int&)
(osal-unix.cc:369)
==21788==    by 0x10C4C0: main (main.cc:525)
==21788==  Uninitialised value was created by a client request
==21788==    at 0x4E5ABD2: mdbx_page_malloc (core.c:3563)
==21788==    by 0x4E5EBE7: mdbx_page_split (core.c:14502)
==21788==    by 0x4E62720: mdbx_cursor_put (core.c:12214)
==21788==    by 0x4E63C39: mdbx_put (core.c:14879)
==21788==    by 0x11A068: testcase::insert(std::unique_ptr<keygen::result,
keygen::buffer_deleter> const&, std::unique_ptr<keygen::result,
keygen::buffer_deleter> const&, unsigned int) (test.cc:617)
==21788==    by 0x115358: testcase_nested::grow_head(unsigned int)
(nested.cc:195)
==21788==    by 0x1157D9: testcase_nested::run() (nested.cc:272)
==21788==    by 0x1190A1: test_execute(actor_config const&) (test.cc:585)
==21788==    by 0x10F18D: osal_actor_start(actor_config const&, int&)
(osal-unix.cc:369)
==21788==    by 0x10C4C0: main (main.cc:525)

This is a test code for an embedded DBMS (called MDBX), which is well checked.
The presence of such an error (getting uninitialized data in the list of pages)
is extremely unlikely, since it will lead to damage to the DBMS (B+tree)
structure, and the test could not be completed successfully.

In addition, the MDBX code supports the build mode with-valgrind, in which more
strict control is organized through Valgrind's "Client Request" mechanism. In
this mode, there are no messages about using uninitialized data (not counting
the symptoms of https://bugs.kde.org/show_bug.cgi?id=417075). However, the
absence of messages in this mode is not sufficient, since (first) users do not
use it to check their programs, and (second) there may be an errors using the
"client request" mechanism.

So I tried to make some modifications to manifest a bugs. In simpler terms:
1) To minimize the influence I defined xx_VALGRIND_MAKE_MEM_UNDEFINED macro
exactly the same as VALGRIND_MAKE_MEM_UNDEFINED, and redefne all native
Valgrind's macros to the "do{}while(0)". This allows me to make a client
request in one place, but disable all the others without changing the source
code.
2) Added initialization of the suspected memory allocation with a
characteristic value (e.g. 42) with a following call to
xx_VALGRIND_MAKE_MEM_UNDEFINED(). Thus now if I see the values "42" in the
data, then the valgrind messages are correct, otherwise false-positive.
For complete changeset see
https://github.com/leo-yuriev/libmdbx/commit/5bf3d285cb55a8266308d5b5d9e236e40f282c04


So, actually the source code of the relevant function is:

MDBX_page *mdbx_page_malloc(MDBX_txn *txn, unsigned num) {
  MDBX_env *env = txn->mt_env;
  MDBX_page *np = env->me_dpages;
  size_t size = env->me_psize;
  if (likely(num == 1 && np)) {
    ASAN_UNPOISON_MEMORY_REGION(np, size);
    VALGRIND_MEMPOOL_ALLOC(env, np, size);
    VALGRIND_MAKE_MEM_DEFINED(&np->mp_next, sizeof(np->mp_next));
    env->me_dpages = np->mp_next;
  } else {
    size = pgno2bytes(env, num);
    np = mdbx_malloc(size);
    if (unlikely(!np)) {
      txn->mt_flags |= MDBX_TXN_ERROR;
      return np;
    }
    VALGRIND_MEMPOOL_ALLOC(env, np, size);
  }
  memset(np, 42, size);
#if MDBX_DEBUG
  np->mp_pgno = 0;
#endif
  xx_VALGRIND_MAKE_MEM_UNDEFINED(np, size);
  np->mp_flags = 0;
  np->mp_pages = num;
  return np;
}
https://github.com/leo-yuriev/libmdbx/blob/5bf3d285cb55a8266308d5b5d9e236e40f282c04/src/elements/core.c#L3540-L3567

Then I ran the test in stop mode on an error, connect by gdb+vgdb, and looked
at the contents of the memory bytes that valgrind reported as uninitialized.
There are no 42 values. Always and everywhere I saw correct meaningful values,
which is expected (otherwise the test would break quickly). 
This allows me to be sure that valgrind messages are false-positive.

A couple of significant notes:
1. If we comment out xx_VALGRIND_MAKE_ME_UNDEFINED, but leave memset(.., 42,
..), as expected Valgrind stops reporting uninitialized data, and the DBMS test
still works without errors.
2. I use Valgrind version 3.13 (with GCC 7.4), since it does not show the
problem https://bugs.kde.org/show_bug.cgi?id=417075, but version 3.15 (with GCC
9.1) behaves similarly, i.e. it gives false-positive reports.
3. I was able to reproduce the problem only in a multi-process test scenario,
when the original process runs multiple copies of itself that are operating
concurrently with a single DB in shared memory. However, the" non-initialized "
memory allocations reported by valgrind are not shared between processes, i.e.
they are always local.

STEPS TO REPRODUCE
1.  git clone https://github.com/leo-yuriev/libmdbx.git --branch
digging-valgrind
2.  cd libmdbx
3. make -j2 mdbx_test && rm -f /dev/shm/mdbx-test.* *.log && \
valgrind --trace-children=yes --log-file=valgrind-%p.log --leak-check=full
--track-origins=yes --error-exitcode=42
--suppressions=test/valgrind_suppress.txt --vgdb=yes --vgdb-error=2 \
./mdbx_test basic > test.log
4. Wait about 30 seconds and look at the valgrind-*.log files to find out the
pid of the process with the error.
5. gdb ./mdbx_test; target remote | vgdb --pid=XYZ

I assume that such an error can be difficult to reproduce, analyze, and fix.
However, it is likely that this error is the cause of many cases of
false-positive reports. Therefore, I am ready to provide all possible
assistance in fixing a bug.

On the other hand, in addition to the correction itself, it is important for me
to confirm the existence of such an Valgrind's bug. Because it allows you to
explain the reason and eliminate some users' distrust of my code.

Thanks in advance.

Regards,
Leonid.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to