Responding to some of Bolke's concerns in the github PR for this change: > Mmm still not convinced. Especially on elastic search it is just easier to use the start_date to shard on. sharding on start_date isn't great because there is still some risk of collisions and it means that we are coupling the primary key with start_date unnecessarily (e.g. hypothetically you could allow two tasks to run at the same in Airflow and in this case start_date would no longer be a valid primary key), using monotonically increasing IDs for DB entries like this is pretty standard practice.
> In addition I'm very against the managing of log files this way. Log files are already a mess and should be refactored to be consistent and to be managed from one place. I agree about the logging mess, and there seem to have been efforts attempting to fix this but they have all been abandoned so we decided to move ahead with this change. I need to take a look at the PR first, but this change should actually make logging less messy, since it should add an abstraction for logging modules, and because you know exactly which try numbers (and how many) ran on which workers from the file path. The log folder structure already kind of mimicked the primary key of the task_instance table (dag_id + task_id + execution_date), but really try_number logically belongs in this key as well (at least for the key for log files). > The docker packagers can already not package airflow correctly without jumping through hoops. Arbitrarily naming it certainly does not help here. If this is referring to the /<ATTEMPT #>/ in the path, I don't think this is arbitrarily naming it. A log "unit" really should be a single task run (not an arbitrary grouping of a variable number of multiple runs), and each unit should have a unique key or location. One of the reasons we are working on this effort is to actually make Airflow play nicer with Kubernetes/Docker (since airflow workers should ideally be ephemeral), and allowing a separate service to read and ship the logs is necessary in this case since the logs will be destroyed along with the worker instance. I think in the future we should also allow custom logging modules (e.g. directly writing logs to some service). On Wed, Jun 21, 2017 at 3:11 PM, Allison Wang <allisonwang...@gmail.com> wrote: > Hi, > > I am in the process of making airflow logging backed by Elasticsearch > (more detail please check AIRFLOW-1325 > <https://issues.apache.org/jira/browse/AIRFLOW-1325>). Here are several > more logging improvements we are considering: > > *1. Log streaming.* Auto-refresh the logs if tasks are running. > > *2. Separate logs by attempts.* > [image: Screen Shot 2017-06-21 at 2.49.11 PM.png] > Instead of logging everything into one file, logs can be separated by > attempt number and displayed using tabs. Attempt number here is a > monotonically increasing number that represents each task instance run > (unlike try_number, clear task instance won't reset attempt number). > *try_number:* n^th retry by the task instance. try_number should not be > greater than retries. Clear task will set try_number to 0. > *attempt:* number of times current task instance got executed. > > *3. Collapsable logs.* Collapse logs that are mainly for debugging > airflow internal and aren't really related to users' tasks (for example, > logs showed before "starting attempt 1 of 1") > > All suggestions are welcome. > > Thanks, > Allison >