Only in some cases, we do not have a DAG ID, but only have the path to the file.
In my opinion, we can do it now. One of the most important changes is to ensure a stable ID. Now we delete and add the errors again, and we should check which errors should be deleted and which should be added. If we have a stable ID then we can add new metadata to the row. Why do we need a new table - import_errors_history? Can't we use the current table? On Thu, Sep 17, 2020 at 10:01 AM Jarek Potiuk <[email protected]> wrote: > I am all for it. > > This should be - likely - connected with the future versioning of DAGs > (currently deferred to 2.1). > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning > - possibly, rather than being a separate AIP, it should be > incorporated there. > > I believe in the versioning implementation we will already have a > table where we will keep information about DAGs together with their > hash, so it seems natural that such "errors" should be connected to > such "DAG_ID" "HASH" combination. > > And I would love to change the name of it to "parse errors". "Import > errors" suggests that those are errors that come from a wrong "import" > statement. But we are really talking about any kind of parsing error. > > J. > > On Wed, Sep 16, 2020 at 3:48 AM Jacob Ferriero > <[email protected]> wrote: > > > > Hello Airflow Dev List, > > > > I'm considering proposing a refactor to import errors in order to support > > sending alert emails when the scheduler finds an import error (but not > > every time the scheduler finds the same import error). This is currently > > not possible because the import errors are cleared during each scheduler > > loop. > > > > I'd like to poll the community for perspectives on other short commings > of > > the import error model before proposing a refactor or other use cases > folks > > might have for such a refactor (e.g. supporting an arbitrary callback > > function similar to SLA miss). > > > > My current thought is to just add an import_errors_history table to the > > database that is not cleared on each scheduler loop and does keep track > of > > if an email was sent in a boolean field. The primary key could be > > constructed from a file hash and exception classname. > > > > Does this one use case warrant a new table? Should we just replace the > > import_errors table in place? > > > > If I can get a sense of high-level direction I can put together an AIP / > PR. > > > > Cheers, > > Jake > > > > -- > > > > *Jacob Ferriero* > > > > Strategic Cloud Engineer: Data Engineering > > > > [email protected] > > > > 617-714-2509 > > > > -- > > Jarek Potiuk > Polidea | Principal Software Engineer > > M: +48 660 796 129 >
