This is reasonable, it could be nice to have a generic way to replace operators kwargs with callables. In the meantime you can try this hack deriving an operator inline with your DAG definition. In this hack, the callable receives the operator's context object which is nice, it provides a handle on a lot of things defined here: https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1886 (same as what's in the jinja template context).
class DerivedFooOperator(FooOperator): def _bar(self, context): return datetime.datetime() # only gets evaluated at run time def execute(self, context): self.bar = self._bar(context) super(DerivedFooOperator, self).execute(context) If `bar` is a required arg, you'll have to pass a dummy static value on initialization, but it will get overwritten at runtime. You can imagine having a more generic class mixin to do this. Or maybe BaseOperator could have a `kwarg_overrides_callables` that would be a dict of string: callable that would execute somewhere in between `__init__` and `execute` and do the magic. Or how about a `pre_execute(context): pass` BaseOperator method as a nice hook to allow for this kind of stuff without having to call `super`. Max On Mon, Aug 27, 2018 at 2:29 PM Victor Jimenez <vjime...@vistaprint.com> wrote: > TL;DR Is there any recommended way to lazily load input for Airflow > operators? > > > I could not found a way to do this. While I faced this limitation while > using the Databricks operator, it seems other operators might potentially > lack such a functionality. Please, keep reading for more details. > > > --- > > > When instantiating a DatabricksSubmitRunOperator ( > https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/databricks_operator.py) > users need to pass the description of the job that will later be executed > on Databricks. > > The job description is only needed at execution time (when the hook is > called). However, the json parameter must already have the full job > description when constructing the operator. This may present a problem if > computing the job description needs to execute expensive operations (e.g., > querying a database). The expensive operation will be invoked every single > time the DAG is reprocessed (which may happen quite frequently). > > It would be good to have an equivalent mechanism to the python_callable > parameter in the PythonOperator. In this way, users could pass a function > that would generate the job description only when the operator is actually > executed. I discussed this with Andrew Chen (from Databricks), and he > agrees it would be an interesting feature to add. > > > Does this sound reasonable? Is this use case supported in some way that I > am unaware of? > > > You can find the issue I created here: > https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-2964 >