Hi David et al,
I was very convinced about Dynamic Task Sharding during the call because:
* Dynamic Task Mapping - we all know
* Dynamic Task Iteration - the new async kid in town? Taking all into
a single execution (with risk of fail all or nothing...)
As David was describing the way to put the iterations into
(partitions/slices/chunks) I am still up for it.
Batching would also be okay but feels like more matching for the thing
that "Iteration" is for, looping in async over a list. But the term
discussion was more that if you have 17 000 in the list you probably
rather want to track 170 "batches/partitions" as task processes being
supervised of each running 100 list items. As the "batch" is 17 000
items, the "split/partitioning" to be named "batch" sounds a bit
un-natural. Because previous "iteration" also was a bit of a batch.
Or do I mis-interpret?
Dynamic Task Mapping:
items = maky_me_a_work_list()
serious_work = PythonOperator.partial(
task_id="serious_work",
...
).expand(op_args=items)
Dynamic Task Iteration:
async_work = PythonOperator.partial(
task_id="async_work",
...
).iterate(op_args=items)
Dynamic Task Iteration with "partitions/slices"?
large_async_work_in_pieces = PythonOperator.partial(
task_id="large_async_work_in_pieces",
...
).iterate(op_args=items, shrad=170)
large_async_work_in_pieces = PythonOperator.partial(
task_id="large_async_work_in_pieces",
...
).iterate(op_args=items, slice=170)
large_async_work_in_pieces = PythonOperator.partial(
task_id="large_async_work_in_pieces",
...
).iterate(op_args=items, batch=100)
(okay reading the code, "partition", "shrad" or "slice" would describe
how many pieces to vut the elephant into and "batch" would be convincing
to tell how many tasks to put together sharing a loop... so
thinking-out-loud "batch" would be also OK if we want to describe the
"package side of the elephan slice".
@David ... if I mis-understood can you share the PR link or the demo
code to re-read what you presented?
Jens
On 04.06.26 19:54, Tzu-ping Chung via dev wrote:
I think dynamic task batch(ing) would be reasonable.
Python’s itertools has batched() that kind of is the same concept.
TP
On 5 Jun 2026, at 00:56, Blain David<[email protected]> wrote:
Hi all,
We need a better name than partition for Dynamic Task Partitioning.
The main issue is that partition already strongly suggests asset/data
partitions in Airflow,
so using the same word here creates avoidable confusion for users and
contributors.
We’d like a term that is clear, intuitive, and doesn’t overlap with existing
Airflow concepts.
Some alternatives raised so far during the devcall:
*
batch (e.g. Dynamic Task Batching)
*
chunk (e.g. Dynamic Task Chunking)
*
slice (bit confusing but chose to still mention it anway)
*
shard
*
segment
My current lean is towards chunk and batch. It feels familiar, readable in both
code and docs, and avoids the existing partition/data-partition association.
I’d love feedback on:
*
which term feels most natural
*
which term is least ambiguous
*
or whether there’s a better option we haven’t considered?
One note: map was mentioned as well, but that seems too close to existing
task.map() terminology.
Please share thoughts, especially if you have concerns about any of the options
above or a stronger suggestion for the long-term name.
Naming is indeed hard 🙂
Kind regards,
David
---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]