Sorry for being terse before.

So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01 
(the default of the DB server)

For me, on a currently running 1.9 (no TZ) db:

airflow=# select * from task_instance;
get_op            | example_http_operator | 2018-07-23 00:00:00

This date time appears in the log url, and the path it looks at on S3 is 

.../example_http_operator/2018-07-23T00:00:00/1.log

If my postgres server has a default timezone of GB (which the one running on my 
laptop does), and I then apply the migration then it is converted to that local 
time.

airflow=# select * from task_instance;
get_op            | example_http_operator | 2018-07-23 01:00:00+01

airflow=# set timezone=UTC;
airflow=# select * from task_instance;
get_op            | example_http_operator | 2018-07-23 00:00:00+00


This is all okay so far. The migration has kept the column at the same moment 
in time.

The issue come when the UI tries to display logs for this old task: because the 
timezone of the connection is not UTC, PG returns a date with a +01 TZ. Thus 
after the migration this old task tries to look for a log file of

.../example_http_operator/2018-07-23T01:00:00/1.log

which doesn't exist - it's changed the time it has rendered from midnight (in 
v1.9) to 1am (in v1.10).

(This is with my change to log_filename_template from UPDATING.md in my other 
branch)

Setting the timezone to UTC per connection means the behaviour of Airflow 
doesn't change depending on how the server is configured.

-ash

> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Digging in a bit further. 
> 
> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number }}}}.log
> 
> is the format
> 
> ts = execution_date.isoformat and should be in UTC afaik.
> 
> something is weird tbh.
> 
> B.
> 
> 
>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> Ash,
>> 
>> Reading your proposed changes on your “set-timezone-to-utc” branch and below 
>> analysis, I am not sure what you are perceiving as an issue.
>> 
>> For conversion we assume everything is stored in UTC and in a naive format. 
>> Conversion then adds the timezone information. This results in the following
>> 
>> postgres timezone = “Europe/Amsterdam”
>> 
>> 
>> airflow=# select * from task_instance;
>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>> 
>> airflow=# set timezone=UTC;
>> airflow=# select * from task_instance;
>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>> 
>> If we don’t set the timezone in the connection postgres assumes server 
>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives 
>> will be in “Europe/Amsterdam” format. However as we defined the model to use 
>> UTCDateTime it will always convert the returned DateTime to UTC.
>> 
>> If we have configured Airflow to support something else as UTC as the 
>> default timezone or a DAG has a associated timezone we only convert to that 
>> timezone when calculating the next runtime (not for cron btw). Nowhere else 
>> and thus we are UTC everywhere.
>> 
>> What do you think is inconsistent?
>> 
>> Bolke
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> 
>>> wrote:
>>> 
>>> Relating to 2): I'm not sure that the upgrade from timezoneless to timezone 
>>> aware colums in the task instance is right, or at least it's not what I 
>>> expected.
>>> 
>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>> instance (these outputs from psql directly):
>>> 
>>> before: execution_date=2017-09-04 00:00:00
>>> after: execution_date=2017-09-04 01:00:00+01
>>> 
>>> **Okay the migration is fine**. It appears that the migration has done the 
>>> right thing, but my local DB I'm testing with has a Timezone of GB set, so 
>>> Postgres converts it to that TZ on returning an object.
>>> 
>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to have 
>>> consistent behaviour? Is this possible some how? I don't know SQLAlchemy 
>>> that well.
>>> 
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>> <ash_airflowl...@firemirror.com> wrote:
>>>> 
>>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>>> being "task" (was "s3.task" before.). Logging config is much simpler now 
>>>> though. This may be particular to my logging config, but given how much of 
>>>> a pain it was to set up S3 logging in 1.9 I have shared my config with 
>>>> some people in the Gitter chat so It's not just me.
>>>> 
>>>> 2) The path that log-files are written to in S3 has changed (again - this 
>>>> happened from 1.8 to 1.9). I'd like to avoid having to move all of my log 
>>>> files again to continue viewing them. The change is that the path now (in 
>>>> 1.10) has a timezone in it, and the date is in local time, before it was 
>>>> UTC:
>>>> 
>>>> before: 2018-07-23T00:00:00/1.log
>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>> 
>>>> We can possibly get away with an updating note about this to set a custom 
>>>> log_filename_template. Testing this now.
>>>> 
>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <a...@firemirror.com> wrote:
>>>>> 
>>>>> -1(binding) from me.
>>>>> 
>>>>> Installed with:
>>>>> 
>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>>>  
>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>>>  s3, crypto]>=1.10'
>>>>> 
>>>>> Install went fine.
>>>>> 
>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is now a 
>>>>> hard dependency on the Kubernetes client libs, but the `emr` group 
>>>>> doesn't mention this.
>>>>> 
>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>>>> 
>>>>> I see two options for this - either conditionally enable k8s:// support 
>>>>> if the import works, or (less preferred) add kube-client to the emr deps 
>>>>> (which I like less)
>>>>> 
>>>>> Sorry - this is the first time I've been able to test it.
>>>>> 
>>>>> I will install this dep manually and continue testing.
>>>>> 
>>>>> -ash
>>>>> 
>>>>> (Normally no time at home due to new baby, but I got a standing desk, and 
>>>>> a carrier meaning she can sleep on me and I can use my laptop. Win!)
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbr...@gmail.com 
>>>>>> <mailto:bdbr...@gmail.com>> wrote:
>>>>>> 
>>>>>> Bump. 
>>>>>> 
>>>>>> Committers please cast your vote. 
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <fo...@driesprong.frl 
>>>>>>> <mailto:fo...@driesprong.frl>> wrote:
>>>>>>> 
>>>>>>> +1 Binding
>>>>>>> 
>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>>>>>  
>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>>>>> 
>>>>>>> Cheers, Fokko
>>>>>>> 
>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbr...@gmail.com>:
>>>>>>> 
>>>>>>>> Hey all,
>>>>>>>> 
>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the 
>>>>>>>> release,
>>>>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>>>>> 
>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>> 
>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ <
>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>> 
>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source release 
>>>>>>>> that
>>>>>>>> comes with INSTALL instructions.
>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>>>>>>> "sdist"
>>>>>>>> release.
>>>>>>>> 
>>>>>>>> Public keys are available at:
>>>>>>>> 
>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>> 
>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>>>>>> changelog.
>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>> 
>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>> 
>>>>>>>> Please note that the version number excludes the `rcX` string as well
>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will 
>>>>>>>> allow us
>>>>>>>> to rename the artifact without modifying the artifact checksums when we
>>>>>>>> actually release.
>>>>>>>> 
>>>>>>>> WARNING: Due to licensing requirements you will need to set
>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>>>>> installing or upgrading. We will try to remove this requirement for the
>>>>>>>> next release.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Bolke
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to