Re: [DISCUSS] Handling of offline migrations

Jarek Potiuk Mon, 26 Jan 2026 13:49:19 -0800

The Survey has a long feedback loop (> year) and it does not go deep enough
to make some technical decisions valid. So I would not really wait for it -
especially that current survey just completed :)


I personally think there is a value in offline migrations for power users,
who will be able to fix the migration scripts as needed when they are
testing it - and apply some post-processing on the generated or offline
scripts. This is IMHO - main reason why we have offline migrations. They
are not supposed to work "out of the box" - they might be used by power
users to adapt those migrations (for example by scripting around them) to
fix any issues they might have. One of the cases might be - when someone is
using a "postgres-like" database - where there are some subtle
"compatibility" issues which might need very small adjustments to the
scripts. It is not really "recommended" to modify installed airflow code to
fix those - but they might easily get the offline migration script and
post-process it before running on the database (and test it).

And it does not need to be **always** working, It might well be that
regular online migration works fine for some migration step - but then it
fails for another. Then the powers users might choose to do:

a) migrate online until the failing step
b) perform offline migration (with some modification of the offline script)
c) migrate online after the failing step

This is quite possible to script it:

```
airflow db migrate --from-revision OLD_REVISION --to-revision x
airflow db migrate --from-revision x --to-revision  x+1 --s
output_to_modify.sql
convert_to_my_db output_to_modify.sql > output.sql
db-run output.sql
airfow db_migrate --from-revision x+1 --to-revision NEW_VERSION
```

This is how I would approach such migration if just one step was failing.
So even if some "past" migrations don't support it, making the migrations
work in the future is a good idea.

Even if it's not super-performant solutions - possibly doing a separate
case for those make sense - and likely we can even run the "online"
migrations in a more performant way, while we can produce less performant
SQL way for offline work.

J.

On Mon, Jan 26, 2026 at 10:03 PM Ferruzzi, Dennis <[email protected]>
wrote:

> I'd love to see what people think here.  Maybe we could even add a
> question tot he next Airflow Survey to gauge user needs here and see if
> anyone actually uses offline mode or not?
>
> From what I can tell (for transparency I used Cline/Claude Sonnet 4.5 to
> help look through all of the existing migrations for cases which handled
> offline migrations) your idea to just warn when offline seems to be how we
> have traditionally handled this in the past.  The following snippet is
> Claude-generated:
>
> ```
>
> ## Current Pattern in Airflow Migrations
>
> I found several examples where complex data migrations explicitly do NOT
> support offline mode:
>
> __0055 (remove pickled data from dagrun)__: Warns users that conf column
> will be NULL in offline mode, then skips all the Python-based pickle→JSON
> conversion logic
> __0015 (update trigger kwargs)__: Warns about inability to decrypt trigger
> kwargs in offline mode
> __0032 (rename execution_date)__: Skips validation and data checks
> entirely in offline mode, leaves a TODO comment for users
> __0049 (remove pickled xcom data)__: Only checks `is_offline_mode()` to
> skip timeout configuration, but the actual data migration (INSERT...SELECT,
> DELETE, UPDATE with NaN handling) would fail in offline mode anyway
>
> The pattern is clear: complex data migrations that require Python logic,
> batch processing, or error handling simply cannot be represented as static
> SQL scripts.  The existing approach is to warn users and either skip the
> data migration or delete rows to maintain schema consistency.
> ```
>
>
> I am not a SQL expert by any means and there are nearly 100 migrations to
> look through so I am not making any blanket statements here but, as far as
> I can tell, this does appear to be an accurate assessment.  In which case I
> think your plan is solid unless someone weighs in with an alternative
> viewpoint.
>
> FWIW, when I asked Claude to compare the existing precedent with your
> proposal it also made one suggestion which seems totally reasonable as well:
>
> ```
> However, I would suggest one enhancement: In the warning message,
> explicitly document what data users will lose and provide guidance on
> whether they can safely proceed (e.g., "This only affects incomplete DAG
> runs and will not impact completed historical runs").
> ```
>
>
>  - ferruzzi
>
>
> ________________________________
> From: Kataria, Ramit <[email protected]>
> Sent: Wednesday, January 21, 2026 11:26 AM
> To: [email protected]
> Subject: [EXT UNVERIFIED SENDER] [DISCUSS] Handling of offline migrations
>
> CAUTION: Email security compliance checks failed. Do not reply to this
> email, click links or open attachments unless you can confirm the sender.
> For more information, search for DMARC on Amazon's IT Portal at
> it.amazon.com
>
> Hi all,
>
>
> TLDR: what’s our approach for handling offline migrations for cases where
> the options are either deleting some likely non-critical data or spending
> lots of time on implementing, testing and maintaining the offline
> implementation? Do we know if users actually use offline migrations?
>
>
> Some context: I worked on a complex migration [1] related to Deadline
> Alerts a few months ago. This was needed to ensure that users don’t lose
> the archive of missed deadlines and deadlines for incomplete Dag runs
> during migration from 3.1 to 3.2. Another migration[2] will need a similar
> but not as complex change to be able to migrate the
> on_[success/failure]_callbacks for incomplete Dag runs (those callbacks are
> deleted from DB once Dag runs are completed anyway). For both of these,
> especially the former, I decided to implement them purely in SQLA/Python,
> without using any direct SQL statements because (and let me know if any of
> my assumptions are incorrect):
>
>
>   *
> Some of the logic gets very unreadable when implemented in SQL
>   *
> Slightly different SQL statements are required for each type of DB we
> support
>   *
> As far as I know, we don’t have a good way to run automated testing on
> these migrations, especially the edge cases relevant for each migration.
> Running manual testing with lots of edge cases is very time-consuming.
>   *
> The size of the table for each of those migrations is expected to be small
> enough to not cause significant slowdown because of iterating through rows
> in python. I’ve described what each of the migrations would be responsible
> for above and made educated guesses about how users use them and run
> migrations. However, I don’t have data to support this.
>
>
> Basically, I’ve made a small compromise on performance to aim for
> correctness and robustness of these migrations by only using SQLA and
> iterating in Python. Later, when my PR was almost green in CI, I found out
> through a failing test that this approach would not work with offline
> migration because the Python script would not be able to fetch existing
> rows from the database to iterate through them. Fully supporting offline
> migration in this case would require re-implementation in SQL. However, I
> found some old migrations that also don’t fully support offline migrations
> and copied their approach of outputting a large warning saying that offline
> migration is not supported for that change. I also added an SQL statement
> to delete all the rows for the relevant table which would be needed for the
> CI to pass, and for the users to run the migration in offline mode and
> expect airflow to work without DB errors once the migration is complete.
> So, I’m wondering what the best approach would be for this migration
> specifically before we ship 3.2 and also what the best practice should be
> moving forward:
>
>
>   *
> Keep it as is: it would output a large warning in the offline migration
> script and delete the rows needed to proceed
>   *
> Raise an error if the user tries to run the offline migration for this
> change so that the user is forced to read and understand what is happening
> and what they need to do (delete rows from the given table) to proceed
>   *
> Something interactive where the user has to confirm if they’re ok with
> deleting those rows
>   *
> Re-implement the migration in multiple SQL scripts
>
>
> Long term, I believe an easier and more automated way to test edge cases
> in the dataset during migration would be very helpful and improve the
> robustness. I also wonder if we have any anecdotes/data of the offline
> migrations actually being used.
>
>
> [1]
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/migrations/versions/0092_3_2_0_replace_deadline_inline_callback_with_fkey.py
> [2]
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/migrations/versions/0091_3_2_0_restructure_callback_table.py
>
> Best,
> Ramit
>

Re: [DISCUSS] Handling of offline migrations

Reply via email to