On Mon, 30 Mar 2026 at 16:25, Anton A. Melnikov <[email protected]> wrote: > > Hi! > > > My colleagues found that a queries like that: [...] > lead to excessive memory consumption up to 10Gb in this example and > query execution time up to ~1,5min. > > Bisect shows that the problem appeared after commit 83ea6c540 > (Virtual generated columns).
Yep, that looks about accurate. Thanks for the report and initial triage! > This indicates that generated expressions are reparsed multiple times, > once per row to be updated instead of being reused. [...] > I would like to propose a fix that add a caching of the the parsed > expression trees (Node *) in ResultRelInfo, so that build_column_default() > and stringToNode() are executed at most once per attribute per query. > > With this fix, the query execution time > and memory consumption return to normal: I think that fixes the wrong issue. Specifically, ExecInitGenerated mentions that it expects to be called just once per ResultRelInfo, and your code adds code to allow it to be called more than once without causing more problems. While it does solve the complaint, it's still executing more work than it should. Additionally, though it's not a failing of the patch per se, it changes the layout of an existing struct, and so it isn't backportable. The actual issue is that ExecComputeStoredGenerated uses ri_GeneratedExprsU's NULL-ness to check whether the generated columns' expressions have been initialized, whilst for UPDATE ResultRelInfos the initialized-indicator is stored in ri_extraUpdatedCols_valid. So, I think the attached patch is a more appropriate fix, it avoids calling into ExecInitGenerated at all when no column included in expressions was updated. It also adds an assertion that the function isn't called again once the field has been initialized. It also has the benefit of being backportable. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
v2-0001-nodeModifyTable-fix-generated-tables-memory-leak-.patch
Description: Binary data
