Responding to some of Bolke's concerns in the github PR for this change:

> Mmm still not convinced. Especially on elastic search it is just easier
to use the start_date to shard on.
sharding on start_date isn't great because there is still some risk of
collisions and it means that we are coupling the primary key with
start_date unnecessarily (e.g. hypothetically you could allow two tasks to
run at the same in Airflow and in this case start_date would no longer be a
valid primary key), using monotonically increasing IDs for DB entries like
this is pretty standard practice.

> In addition I'm very against the managing of log files this way. Log
files are already a mess and should be refactored to be consistent and to
be managed from one place.

I agree about the logging mess, and there seem to have been efforts
attempting to fix this but they have all been abandoned so we decided to
move ahead with this change. I need to take a look at the PR first, but
this change should actually make logging less messy, since it should add an
abstraction for logging modules, and because you know exactly which try
numbers (and how many) ran on which workers from the file path. The log
folder structure already kind of mimicked the primary key of the
task_instance table (dag_id + task_id + execution_date), but really
try_number logically belongs in this key as well (at least for the key for
log files).


> The docker packagers can already not package airflow correctly without
jumping through hoops. Arbitrarily naming it certainly does not help here.

If this is referring to the /<ATTEMPT #>/ in the path, I don't think this
is arbitrarily naming it. A log "unit" really should be a single task run
(not an arbitrary grouping of a variable number of multiple runs), and each
unit should have a unique key or location. One of the reasons we are
working on this effort is to actually make Airflow play nicer with
Kubernetes/Docker (since airflow workers should ideally be ephemeral), and
allowing a separate service to read and ship the logs is necessary in this
case since the logs will be destroyed along with the worker instance. I
think in the future we should also allow custom logging modules (e.g.
directly writing logs to some service).



On Wed, Jun 21, 2017 at 3:11 PM, Allison Wang <allisonwang...@gmail.com>
wrote:

> Hi,
>
> I am in the process of making airflow logging backed by Elasticsearch
> (more detail please check AIRFLOW-1325
> <https://issues.apache.org/jira/browse/AIRFLOW-1325>). Here are several
> more logging improvements we are considering:
>
> *1. Log streaming.* Auto-refresh the logs if tasks are running.
>
> *2. Separate logs by attempts.*
> [image: Screen Shot 2017-06-21 at 2.49.11 PM.png]
> Instead of logging everything into one file, logs can be separated by
> attempt number and displayed using tabs. Attempt number here is a
> monotonically increasing number that represents each task instance run
> (unlike try_number, clear task instance won't reset attempt number).
> *try_number:* n^th retry by the task instance. try_number should not be
> greater than retries. Clear task will set try_number to 0.
> *attempt:* number of times current task instance got executed.
>
> *3. Collapsable logs.* Collapse logs that are mainly for debugging
> airflow internal and aren't really related to users' tasks (for example,
> logs showed before "starting attempt 1 of 1")
>
> All suggestions are welcome.
>
> Thanks,
> Allison
>

Reply via email to