bil-ly commented on issue #59396:
URL: https://github.com/apache/airflow/issues/59396#issuecomment-3717690568

   Hi @dstandish!  @potiuk I've analyzed the `include_prior_dates` behavior in 
`xcom_pull`. Here's what I found:
   
     ## Current Behavior
   
     When `include_prior_dates=True`:
     - XComs are retrieved from the current run OR any run where `logical_date 
<= current_run.logical_date`
     - Results ordered by: `logical_date DESC, timestamp DESC`
     - Uses `coalesce(logical_date, run_after)` as fallback for runs without 
logical_date
     - Implementation in: `XComModel.get_many()` 
(airflow-core/src/airflow/models/xcom.py:314-327)
   
   I found 6 potential edge cases that need decisions:
   
     ### 1. Multiple runs with same logical_date
     **Current:** Returns most recent by timestamp
     **Question:** Is this the intended behavior? Should we document this 
ordering guarantee?
   
     ### 2. Null logical_date handling
     **Current:** Falls back to `run_after`
     **Question:** Should this be documented? Are there cases where this could 
cause unexpected behavior?
   
     ### 3. Cross-DAG pulls with include_prior_dates
     **Current:** Can search through many historical runs (performance concern)
     **Question:** Should we:
     - Add a warning when no limit is specified?
     - Automatically limit results?
     - Just document the performance implications?
   
     ### 4. Parameter semantics confusion
     **Current:** `run_id` parameter means "reference point for date cutoff" 
when `include_prior_dates=True`, but actual XComs may come from different 
run_ids
     **Question:** Should we:
     - Rename or add a clearer parameter?
     - Just clarify in documentation?
   
     ### 5. Backfill scenarios
     **Current:** Uses logical_date regardless of actual execution order
     **Question:** Is this correct for all use cases? Any scenarios where this 
could cause issues?
   
     ### 6. No limit enforcement
     **Current:** No automatic limits on historical search depth
     **Question:** Should we add:
     - A default max lookback period?
     - Required limit parameter?
     - Just documentation?
   
   I verified these locations and confirmed `include_prior_dates` is NOT needed
   Looking forward to your guidance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to