nooneuse opened a new pull request, #64621:
URL: https://github.com/apache/doris/pull/64621

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   **Problem Summary**:
   
   - Previously, column DEFAULT values in Doris were primarily treated as 
literal constants (or a small set of special built-ins), which limited 
usability for common patterns like “derive default from current date/time” or 
“compose a formatted default string”.
   - This PR introduces expression-based column default values. Users can 
define a column default using an expression composed of:
   - deterministic built-in functions, and
   - a limited set of allowed non-deterministic date/time functions (e.g. 
now/current_timestamp/current_date) to support time-dependent defaults.
   - The PR also clarifies and enforces usage constraints to keep default 
expressions safe and analyzable, and adds a compatibility guard for MoW 
unique-key partial update to avoid unsafe default-filling behavior.
   
   **How to use**
   
   Example (date/datetime defaults):
   `d DATEV2 NOT NULL DEFAULT to_date(now())`
   `dt DATETIMEV2(3) NOT NULL DEFAULT now(3)`
   Example (composed string default):
   `s STRING NOT NULL DEFAULT concat('a-', cast(to_date(now()) as string))`
   
   **Main limitations / constraints**
   
   The default expression must be analyzable and side-effect free:
   
   - No column references.
   - No subqueries.
   - No aggregate functions.
   - No window functions.
   - No UDFs.
   
   Non-deterministic functions are restricted:
   
   - Only allow time-related functions such as 
now/current_timestamp/current_date in default expressions.
   - Other non-deterministic functions (e.g. rand()) are rejected.
   
   Type must be compatible:
   
   - The expression result is cast/coerced to the target column type during 
analysis.
   
   **Default value behavior in different scenarios**
   
   INSERT without specifying the column:
   
   - The column value is generated from the default expression at write time.
   - For time-dependent expressions, the evaluated value depends on the 
statement execution time.
   
   INSERT explicitly providing a value:
   
   - The provided value is used; the default expression is not applied.
   
   MoW unique-key partial update (INSERT-triggered partial update, missing 
non-key columns):
   
   - If the table contains expression-default columns, partial update may 
require filling missing columns using schema defaults.
   - To avoid unsupported/unsafe behavior, this PR rejects such partial updates 
by default with a clear error message.
   - Users can explicitly allow it via session variable 
allow_partial_update_with_expression_default=true if they understand the 
behavior and accept the risks/constraints (see Behavior Details).
   
   > **Behavior Details**
   > 1) Write-time missing columns (DML)
   > 
   > **Covered behaviors**
   > 
   > - INSERT/INSERT INTO ... SELECT ... where some target columns are omitted.
   > - File-based ingestion (e.g. load/scan) where input data does not provide 
all destination columns.
   > - MoW unique-key partial update paths (INSERT-triggered partial update, 
and load jobs with 
unique_key_update_mode=UPDATE_FIXED_COLUMNS/UPDATE_FLEXIBLE_COLUMNS) where 
non-specified columns are treated as “missing” and need to be filled.
   > 
   > **What this PR does**
   > 
   > - Adds support for expression-based column default values, allowing 
defaults defined as expressions composed of:
   > - deterministic built-in functions, and
   > - a limited set of allowed non-deterministic date/time functions (e.g. 
now/current_timestamp/current_date).
   > - For normal DML inserts and ingestion planning, missing columns can use 
the column’s default SQL expression (getDefaultValueSql()), so time-dependent 
defaults behave as expected at write time.
   > - For MoW unique-key partial update, if the table contains 
expression-default columns, this PR rejects the operation by default and 
provides an explicit session switch to allow it:
   > - allow_partial_update_with_expression_default=true
   > 
   > **Why**
   > 
   > - Partial update may require filling missing columns using schema 
defaults. For expression defaults, the semantics (especially with 
non-deterministic time functions) can be ambiguous and may differ from 
“evaluate at write time per row”.
   > - The default guard avoids silent incorrect results and forces users to 
opt in only when they understand the implications.
   > 
   > 2) Reading old data with missing columns (schema evolution read)
   > 
   > **Covered behaviors**
   > 
   > - Scanning/querying old rowsets/segments produced before a schema change 
(e.g. light schema change), where newly added columns do not physically exist 
in the old data files.
   > - Any scan path that materializes missing columns using a default-value 
iterator during read.
   > 
   > **What this PR does**
   > 
   > - Keeps the existing schema-evolution read behavior: missing columns are 
filled by underlying literal defaults.
   > - For expression-default columns, this PR stores an additional folded 
literal (realDefaultValue) computed at DDL/analyze time. Read-time filling uses 
realDefaultValue (a literal), not the original expression SQL.
   > 
   > **Why**
   > 
   > - The BE read path for missing columns expects a literal string that can 
be parsed into a field value; it does not execute arbitrary expressions during 
scan.
   > - Using realDefaultValue preserves compatibility and avoids introducing an 
expression execution dependency into read paths.
   > 
   > 3) Point query / row store missing-column fill (read path)
   > 
   > **Covered behaviors**
   > 
   > Point query / rowid fetch / row-store related read paths that may need to 
materialize a full row and fill columns that are missing in the underlying 
storage representation.
   > 
   > **What this PR does**
   > 
   > Uses the same approach as schema-evolution scans: for expression-default 
columns, read paths rely on the pre-computed literal realDefaultValue (exported 
as column default string in descriptors) to fill missing columns.
   > 
   > **Why**
   > 
   > These read paths are latency sensitive and are not designed to evaluate 
expression defaults at read time.
   > Aligning point query / row store fill with the segment scan behavior 
ensures consistent semantics and implementation simplicity.
   
   ### Release note
   
   - Doris now supports expression-based column default values with strict 
validation rules.
   - INSERT can omit such columns and the default expression will be evaluated 
at write time.
   - For MoW unique-key tables, INSERT-triggered partial update is rejected by 
default when expression-default columns exist; it can be enabled via 
allow_partial_update_with_expression_default.
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
         - Doris now supports expression-based column default values with 
strict validation rules.
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to