Le 14/06/2021 à 18:47, Wes McKinney a écrit :
On Mon, Jun 14, 2021 at 11:33 AM Antoine Pitrou <anto...@python.org> wrote:


Le 14/06/2021 à 18:28, Wes McKinney a écrit :
Hi Antoine — when there is no time zone specified, I do not think it is
appropriate to consider the data to refer to a specific moment in time
without applying an explicit time zone localization.

Well, how can that be done? The timezone information is lost, how can
the user (who possibly got the data from another source) recover it?


This is usually something that people take care of in their application
code. For example, when you parse a CSV and obtain “raw” timestamps, you
have to call “tz_localize” to apply a time zone to the and normalize the
internal representation to UTC.

Right, this is why I advocate for this to be done at the boundary layer. I.e, the CSV, Parquet... readers would expose an option to set the timezone of timestamp columns to a well-defined value.

If you don’t know what the time zone is supposed to be then you can’t get
it back, but you can still do many analytical operations on the data
(aggregating by year or month, for example) just fine. For many users the
absence of time zones is a non-issue in their work.

So, basically, a timestamp without a timezone is still useful as a date (mostly, because the day number may be off)?

But then, why don't we tell users to simply use a date type for such data?

Reply via email to