Hi Chao, Thank you for testing and reviewing. Please find attached the updated patches that incorporate your review comments.
>
> As the header comment says that “totals should not be NULL”, maybe add
> Assert(total!=NULL) to ensure that.
>
>
Added this.
> + /*
> + * Add the count of all the children contexts which are traversed
> + * including the parent.
> + */
> + *num_contexts = *num_contexts + ichild;
> +}
> ```
>
> *num_contexts is only initialized to 0, then *num_contexts = *num_contexts
> + ichild. Looks like the the initialization is unnecessary, and the
> assignment can just be *num_contexts = ichild.
>
Fixed accordingly.
>
> + int path[100];
> ```
>
> You already defined a constant MAX_PATH_DISPLAY_LENGTH below, would it
> make sense to pull up the macro definition and use it for “path” definition?
>
>
Makes sense, done in the attached patch.
> +
> +static const dshash_parameters memctx_dsh_params = {
> + offsetof(MemoryStatsDSHashEntry, memcxt_cv),
> + sizeof(MemoryStatsDSHashEntry),
> + dshash_strcmp,
> + dshash_strhash,
> + dshash_strcpy
> +};
> ```
>
> I wonder why we cannot just use the integer ProcNumber as the hash key? I
> think dshash fully supports fixed-size binary keys. We can use
> dshash_memcmp and dshash_memhash for compare and hash functions.
>
I have incorporated your suggestion in the updated patch—using integer
keys will be more efficient
for memory usage and will help avoid conversions between data types.
Thank you for the suggestion.
+static List *
> +compute_context_path(MemoryContext c, HTAB *context_id_lookup)
> +{
> As in the hash entry, path is a 100-elements array, would it make sense to
> also limit the path list to not append more than 100 items in this function?
>
We need to calculate the full path list to report the number of levels in
the tree.
We could choose to return the number of levels separately from the path
list and limit the
path list to no more than 100 levels, but we would still have to iterate
through all the levels.
Returning a single variable that captures both levels and path keeps the
function signature simple.
Kindly let me know your view.
> + dshash_release_lock(MemoryStatsDsHash, entry);
> + hash_destroy(context_id_lookup);
> +
> + MemoryContextSwitchTo(oldcontext);
> + MemoryContextDelete(memstats_ctx);
> + /* Notify waiting client backend and return */
> + ConditionVariableSignal(&entry->memcxt_cv);
> +}
> ```
>
> This looks like a race condition.
> ConditionVariableSignal(&entry->memcxt_cv); is called after the lock is
> released. So, there is a chance that a process has been terminated, and its
> before_shmem_exit callback acquired the entry lock and deleted the entry
> from the hash. So, I think we should send the signal before releasing the
> lock.
>
>
Good catch. Fixed accordingly.
>
> + */
> + if (ConditionVariableTimedSleep(&entry->memcxt_cv,
> +
> (MEMORY_STATS_MAX_TIMEOUT * 1000),
> +
> WAIT_EVENT_MEM_CXT_PUBLISH))
> ```
>
> For the wait loop, if the condition wakes up for some other reason, it
> will loop back and wait for the other 5 seconds, then total wait period
> will exceed 5 seconds, maybe 9 seconds. This is not a big deal, but the doc
> explicitly says “If the process does not respond with memory contexts
> statistics in 5 seconds”, so that behavior might be inconsistent with the
> doc.
>
> I think we can calculate remaining time and pass remaining milliseconds
> into ConditionVariableTimedSleep.
>
>
The following check in the beginning of the wait loop is intended to avoid
this.
/* Return if we have already exceeded the timeout */
if (elapsed_time >= MEMORY_STATS_MAX_TIMEOUT * 1000)
{
memstats_dsa_cleanup(client_procno);
memstats_client_key_reset(server_procno);
ConditionVariableCancelSleep();
ereport(NOTICE,
errmsg("request for memory context statistics for PID
%d timed out",
pid));
PG_RETURN_NULL();
}
It may still exceed by an amount less than 5 seconds, but it will always be
less
than 2 * MEMORY_STATS_MAX_TIMEOUT, because
we track the elapsed time and exit after the condition above is satisfied.
Maybe we should edit the docs to reflect this.
After ConditionVariableTimedSleep, rather than Assert(found), I think we
> should check if (!found). Because it waits for 5 seconds without holding
> the lock, the process may terminate and delete the entry during the period.
>
>
The client backend deletes the entry only upon exiting; no other process
can remove the entry.
Therefore, it is not possible for the client backend to execute this Assert
after the entry has been
deleted.
>
> This file is sololy added an include without any other change, that seems
> unneeded. I tried to remove this include, and the build still passed.
>
>
Fixed. Thank you for noticing this. It looks like a remnant from earlier
versions.
Thank you,
Rahila Syed
v51-0001-Add-function-to-report-memory-context-statistics.patch
Description: Binary data
v51-0002-Test-module-to-test-memory-context-reporting-wit.patch
Description: Binary data
