Hi Esteban,

Thanks for this detailed feedback, this is most interesting.

I don't have enough time to look at this in detail, I will do so later, but 
here is some initial feedback.

Note that PostreSQL supports many different date/time formats and that the user 
can set preferences per connection, hence some flexibility in parsing is 
probably needed, and that is one reason why the parsing is implemented the way 
it is.

Check section 8.5.2. from 
https://www.postgresql.org/docs/13/datatype-datetime.html

Note that all of the following give the same result:

Date readFrom: 'March 2 2020' readStream.
Date readFrom: '3/2/2020' readStream.
Date readFrom: '2020-03-02' readStream.

Date parsing is probably slow because it has to deal with all these formats.

Of course, parsing a fixed well-known format is always faster.

The timezones are tricky too, as preferences can be set per connection, so we 
have to be careful.

I will come back to this.

Sven 

> On 12 Oct 2020, at 15:04, Esteban Maringolo <emaring...@gmail.com> wrote:
> 
> Hi,
> 
> I recently stumbled into some performance issue I had, so I profiled
> it and got it fixed, but by doing that I also noticed a huge time
> spent when reading DateAndTimes and Dates (which also uses the former)
> and the culprit seems to be `Date class>>#readFrom:`.
> 
> Digging into this, I noticed that this was called from
> `DateAndTime>>#readFrom:`, but in particular 90% of the time spent was
> when instantiating the Date part of it, because it does unnecessary
> lookup of the index of Month when input stream only have a month
> index.
> 
> What I did then was to change the default behavior of DateAndTime
> class>>#readFrom:
> 
> readFrom: aStream defaultOffset: defaultOffset
> "Parse and return a new DateAndTime instance from stream,
> as a Date, an optional Time and an optional TimeZone offset.
> The time defaults to midnight, the timezone to defaultOffset"
> "self readFrom: '2013-03-04T23:47:52.876+01:00' readStream"
> 
> "From this"
> date := Date readFrom: aStream.
> 
> "To this"
> date := Date readFrom: (aStream next: 10) pattern: 'yyyy-mm-dd'.
> 
> And also in P3Converter to specify the pattern directly.
> 
> P3Converter>>#convertDateFrom: bytes length: length description: description
> ^Date readFrom: (self asciiStreamFor: bytes length: length) pattern:
> 'yyyy-mm-dd'
> 
> With these two changes I got a 8x-10x speedup when reading
> Date/DateAndTime from the database.
> 
> Then I dig a little further and in P3Converter added a little check to
> avoid translating the offset of a DateAndTime when it is not needed.
> 
> P3Converter>>#convertDateAndTimeWithoutTimezoneFrom: bytes length:
> length description: description
> "TIMESTAMP WITHOUT TIME ZONE (TIMESTAMP) is stored internally in
> Postgres the way it was inserted, its representation remains constant,
> with no offset added. We use the timezone of the connection to do the
> necessary shifting. This assumes that the timezones used during
> insert/update and query are the same."
> 
>  | timestamp offset |
>  timestamp := DateAndTime readFrom: (self asciiStreamFor: bytes
> length: length) defaultOffset: Duration zero.
>  offset := self timezone offsetForTimestamp: timestamp.
>  ^offset isZero "<-- added this check to avoid translation"
>    ifTrue: [ timestamp ]
>    ifFalse: [ timestamp translateTo: offset ]
> 
> And the speedup went even higher. But I don't know whether this last
> part is correct.
> 
> Of course this seems as a quick and dirty fix, but I don't know
> whether DateAndTime>>readFrom: should read any format or just ISO 8601
> formats nor whether the Date/DateAndTime coming from PostgreSQL are
> always formatted that way. I ran my app's small test suite and haven't
> had any issues, neither during development with these changes applied.
> 
> I think there is a significant slowdown involved in the instantiation
> of Dates that should be fixed, whether this fix is definitive or not
> is why I'm asking here.
> 
> Regards!
> 
> Esteban A. Maringolo

Reply via email to