[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-5009:
--------------------------------------
    Labels: pull-request-available  (was: )

> Memory Leak in zoo_sasl_client_create
> -------------------------------------
>
>                 Key: ZOOKEEPER-5009
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5009
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.9.4
>            Reporter: Cyl
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: fake_strdup_trigger.c, repro_sasl_leak_loop.c, 
> verify_sasl_leak.py
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {*}Description{*}: In 
> {{{}zookeeper-client/zookeeper-client-c/src/zk_sasl.c{}}}, the function 
> {{zoo_sasl_client_create}} allocates memory for a {{zoo_sasl_client_t}} 
> structure ({{{}sc{}}}). It then attempts to duplicate several strings 
> (service, host, mechlist) using {{{}_zsasl_strdup{}}}.
> If any of these string duplications fail (e.g., due to Out Of Memory), the 
> function calls {{zoo_sasl_client_destroy(sc)}} and returns {{{}NULL{}}}.
> However, {{zoo_sasl_client_destroy}} only frees the _members_ of the 
> structure, not the structure pointer ({{{}sc{}}}) itself. This results in a 
> memory leak of the {{zoo_sasl_client_t}} struct (size of the struct) every 
> time initialization fails.
>  
> {code:java}
> // Vulnerable pattern
> zoo_sasl_client_t *sc = calloc(1, sizeof(*sc));
> // ...
> if (rc != ZOK) {
>     zoo_sasl_client_destroy(sc);
>     return NULL; // Leak: sc is never freed
> } {code}
>  
> *Impact* Memory allocation failures can be transient (e.g., temporary spikes 
> in load). Robust applications (like ZooKeeper clients) are designed to retry 
> connections and initializations. If every failed attempt leaks memory, a 
> temporary issue becomes a permanent degradation, eventually crashing the 
> application completely due to OOM.
> *Reproduction* A Proof-of-Concept (PoC) was created to simulate an allocation 
> failure during the {{strdup}} calls.
>  # {*}Hook Library ({{{}fake_strdup_trigger.c{}}}){*}: A shared library 
> loaded via {{LD_PRELOAD}} that intercepts {{{}strdup{}}}. It returns {{NULL}} 
> when a specific trigger string ("TRIGGER_OOM") is passed, simulating an OOM 
> condition.
>  # {*}Loop Trigger ({{{}repro_sasl_leak_loop.c{}}}){*}: A C program that 
> repeatedly calls {{zookeeper_init_sasl}} with the trigger string as the host. 
> This causes {{zoo_sasl_client_create}} to fail at the {{strdup(host)}} step.
>  # {*}Verification Script ({{{}verify_sasl_leak.py{}}}){*}: Compiles the 
> tools, runs the loop, and monitors the process RSS memory usage.
>  
> *Fix* Add {{free(sc)}} to the error handling path.
> {code:java}
> if (rc != ZOK) {
>     zoo_sasl_client_destroy(sc);
>     free(sc); // Fix
>     return NULL;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to