Yup, just worked out the same thing.

I think as "punishment" for me finding bugs so late in two RCs (this, and 1.9) 
I should run the release for the next release.

-ash

> On 5 Aug 2018, at 22:05, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> Yeah I figured it out. Originally i was using a different implementation of 
> UTCDateTime, but that was unmaintained. I switched, but this version changed 
> or has a different contract. While it transforms on storing to UTC it does 
> not so when it receives timezone aware fields from the db. Hence the issue.
> 
> I will prepare a PR that removes the dependency and implements our own 
> extension of DateTime. Probably tomorrow.
> 
> Good catch! Just in time :-(.
> 
> B.
> 
>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> 
>> wrote:
>> 
>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>> issue I was addressing was entirely in the webserver for a pre-existing Task 
>> Instance.
>> 
>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>> isn't working right/ as expected. This line: 
>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>  doens't look right for us - as you mentioned the TZ is set to something 
>> (rather than having no TZ value).
>> 
>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>> behave the same.
>> 
>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 01:00:00+01
>> 
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 01:00:00+01
>> 
>> The server will always return TZs in the connection timezone.
>> 
>> postgres=# set timezone=utc;
>> SET
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> 
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> This is the issue:
>>> 
>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>>> 00:00:00+00:00 tzinfo: <Timezone [UTC]>
>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun 
>>> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>> 
>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
>>> name=None)
>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun 
>>> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>> 
>>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>>> DB it gets a +2. At the next bit (3+4) we get a 
>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>>> specs of https://github.com/spoqa/sqlalchemy-utc 
>>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>>> 
>>> So changing your setting of the DB to UTC fixes the symptom but not the 
>>> cause.
>>> 
>>> B.
>>> 
>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
>>>> <ash_airflowl...@firemirror.com> wrote:
>>>> 
>>>> Sorry for being terse before.
>>>> 
>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>>> GB/+01 (the default of the DB server)
>>>> 
>>>> For me, on a currently running 1.9 (no TZ) db:
>>>> 
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 00:00:00
>>>> 
>>>> This date time appears in the log url, and the path it looks at on S3 is 
>>>> 
>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>>> 
>>>> If my postgres server has a default timezone of GB (which the one running 
>>>> on my laptop does), and I then apply the migration then it is converted to 
>>>> that local time.
>>>> 
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 01:00:00+01
>>>> 
>>>> airflow=# set timezone=UTC;
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 00:00:00+00
>>>> 
>>>> 
>>>> This is all okay so far. The migration has kept the column at the same 
>>>> moment in time.
>>>> 
>>>> The issue come when the UI tries to display logs for this old task: 
>>>> because the timezone of the connection is not UTC, PG returns a date with 
>>>> a +01 TZ. Thus after the migration this old task tries to look for a log 
>>>> file of
>>>> 
>>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>>> 
>>>> which doesn't exist - it's changed the time it has rendered from midnight 
>>>> (in v1.9) to 1am (in v1.10).
>>>> 
>>>> (This is with my change to log_filename_template from UPDATING.md in my 
>>>> other branch)
>>>> 
>>>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>>>> doesn't change depending on how the server is configured.
>>>> 
>>>> -ash
>>>> 
>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>> 
>>>>> Digging in a bit further. 
>>>>> 
>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number 
>>>>> }}}}.log
>>>>> 
>>>>> is the format
>>>>> 
>>>>> ts = execution_date.isoformat and should be in UTC afaik.
>>>>> 
>>>>> something is weird tbh.
>>>>> 
>>>>> B.
>>>>> 
>>>>> 
>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>>> 
>>>>>> Ash,
>>>>>> 
>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch and 
>>>>>> below analysis, I am not sure what you are perceiving as an issue.
>>>>>> 
>>>>>> For conversion we assume everything is stored in UTC and in a naive 
>>>>>> format. Conversion then adds the timezone information. This results in 
>>>>>> the following
>>>>>> 
>>>>>> postgres timezone = “Europe/Amsterdam”
>>>>>> 
>>>>>> 
>>>>>> airflow=# select * from task_instance;
>>>>>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>>>>>> 
>>>>>> airflow=# set timezone=UTC;
>>>>>> airflow=# select * from task_instance;
>>>>>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>>>>>> 
>>>>>> If we don’t set the timezone in the connection postgres assumes server 
>>>>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow 
>>>>>> receives will be in “Europe/Amsterdam” format. However as we defined the 
>>>>>> model to use UTCDateTime it will always convert the returned DateTime to 
>>>>>> UTC.
>>>>>> 
>>>>>> If we have configured Airflow to support something else as UTC as the 
>>>>>> default timezone or a DAG has a associated timezone we only convert to 
>>>>>> that timezone when calculating the next runtime (not for cron btw). 
>>>>>> Nowhere else and thus we are UTC everywhere.
>>>>>> 
>>>>>> What do you think is inconsistent?
>>>>>> 
>>>>>> Bolke
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor 
>>>>>>> <ash_airflowl...@firemirror.com> wrote:
>>>>>>> 
>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to 
>>>>>>> timezone aware colums in the task instance is right, or at least it's 
>>>>>>> not what I expected.
>>>>>>> 
>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the same 
>>>>>>> task instance (these outputs from psql directly):
>>>>>>> 
>>>>>>> before: execution_date=2017-09-04 00:00:00
>>>>>>> after: execution_date=2017-09-04 01:00:00+01
>>>>>>> 
>>>>>>> **Okay the migration is fine**. It appears that the migration has done 
>>>>>>> the right thing, but my local DB I'm testing with has a Timezone of GB 
>>>>>>> set, so Postgres converts it to that TZ on returning an object.
>>>>>>> 
>>>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to 
>>>>>>> have consistent behaviour? Is this possible some how? I don't know 
>>>>>>> SQLAlchemy that well.
>>>>>>> 
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>>>>>> <ash_airflowl...@firemirror.com> wrote:
>>>>>>>> 
>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now 
>>>>>>>> always being "task" (was "s3.task" before.). Logging config is much 
>>>>>>>> simpler now though. This may be particular to my logging config, but 
>>>>>>>> given how much of a pain it was to set up S3 logging in 1.9 I have 
>>>>>>>> shared my config with some people in the Gitter chat so It's not just 
>>>>>>>> me.
>>>>>>>> 
>>>>>>>> 2) The path that log-files are written to in S3 has changed (again - 
>>>>>>>> this happened from 1.8 to 1.9). I'd like to avoid having to move all 
>>>>>>>> of my log files again to continue viewing them. The change is that the 
>>>>>>>> path now (in 1.10) has a timezone in it, and the date is in local 
>>>>>>>> time, before it was UTC:
>>>>>>>> 
>>>>>>>> before: 2018-07-23T00:00:00/1.log
>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>>>>>> 
>>>>>>>> We can possibly get away with an updating note about this to set a 
>>>>>>>> custom log_filename_template. Testing this now.
>>>>>>>> 
>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <a...@firemirror.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> -1(binding) from me.
>>>>>>>>> 
>>>>>>>>> Installed with:
>>>>>>>>> 
>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>>>>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>>>>>>>  
>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>>>>>>>  s3, crypto]>=1.10'
>>>>>>>>> 
>>>>>>>>> Install went fine.
>>>>>>>>> 
>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is now 
>>>>>>>>> a hard dependency on the Kubernetes client libs, but the `emr` group 
>>>>>>>>> doesn't mention this.
>>>>>>>>> 
>>>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>>>>>>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>>>>>>>> 
>>>>>>>>> I see two options for this - either conditionally enable k8s:// 
>>>>>>>>> support if the import works, or (less preferred) add kube-client to 
>>>>>>>>> the emr deps (which I like less)
>>>>>>>>> 
>>>>>>>>> Sorry - this is the first time I've been able to test it.
>>>>>>>>> 
>>>>>>>>> I will install this dep manually and continue testing.
>>>>>>>>> 
>>>>>>>>> -ash
>>>>>>>>> 
>>>>>>>>> (Normally no time at home due to new baby, but I got a standing desk, 
>>>>>>>>> and a carrier meaning she can sleep on me and I can use my laptop. 
>>>>>>>>> Win!)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbr...@gmail.com 
>>>>>>>>>> <mailto:bdbr...@gmail.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Bump. 
>>>>>>>>>> 
>>>>>>>>>> Committers please cast your vote. 
>>>>>>>>>> 
>>>>>>>>>> B.
>>>>>>>>>> 
>>>>>>>>>> Sent from my iPhone
>>>>>>>>>> 
>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <fo...@driesprong.frl 
>>>>>>>>>>> <mailto:fo...@driesprong.frl>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> +1 Binding
>>>>>>>>>>> 
>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>>>>>>>>>  
>>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>>>>>>>>> 
>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>> 
>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbr...@gmail.com>:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the 
>>>>>>>>>>>> release,
>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>>>>>>>>> 
>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/
>>>>>>>>>>>>  <
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>>>>>> 
>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source 
>>>>>>>>>>>> release that
>>>>>>>>>>>> comes with INSTALL instructions.
>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>>>>>>>>>>> "sdist"
>>>>>>>>>>>> release.
>>>>>>>>>>>> 
>>>>>>>>>>>> Public keys are available at:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>>>>>> 
>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>>>>>>>>>> changelog.
>>>>>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>>>>>> 
>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>>>>>> 
>>>>>>>>>>>> Please note that the version number excludes the `rcX` string as 
>>>>>>>>>>>> well
>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will 
>>>>>>>>>>>> allow us
>>>>>>>>>>>> to rename the artifact without modifying the artifact checksums 
>>>>>>>>>>>> when we
>>>>>>>>>>>> actually release.
>>>>>>>>>>>> 
>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set
>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>>>>>>>>> installing or upgrading. We will try to remove this requirement 
>>>>>>>>>>>> for the
>>>>>>>>>>>> next release.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Bolke
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to