Re: DateTime performance

2012-05-04 Thread Philipp K. Janert
On Thursday 03 May 2012 02:14:45 you wrote:
> > From: Philipp K. Janert [mailto:jan...@ieee.org]
> > Sent: Wednesday, 2 May 2012 8:29 AM
> > 
> > Question:
> > 
> > When using DateTime for a large number of
> > instances, it becomes a serious performance
> > drag.
> 
> ...
> 
> > Is this expected behavior? And are there access
> > patterns that I can use to mitigate this effect?
> > (I tried to supply a time_zone explicitly, but that
> > does not seem to improve things significantly.)
> 
> Hi Phillip,
> 
> My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it
> in to each creation of a DateTime object (via whatever mechanism you're
> using to do that). I have seen a case where we were using time_zone =>
> 'local' in a reasonably tight datetime object creation loop and saw
> significant speed increases just by cutting out that chunk of processing.
> 
> In hindsight that was a silly thing to do but it became an easy win :-)
> 
> I apologise if this is what you meant by supplying a time_zone explicitly
> in your comment above.

I have tried to specify the timezone explicitly as a string:
  $dt = DateTime->new( ..., time_zone => "America/Chicago" )
which does not seem to help, but I have not tried to do:
  $tz = DateTime::TimeZone( 'America/Chicago' )
  $dt = DateTime->new( ..., time_zone => $tz )

I'll try that the next time I have to process one of my data
sets again. ;-)

Thanks for the hint.

> 
> I can't recommend using a tool like NYTProf highly enough on a run of your
> tool to spot the low hanging fruit. See
> https://metacpan.org/module/Devel::NYTProf
> 
> Cheers,
> 
> Andrew


Re: DateTime performance

2012-05-04 Thread Philipp K. Janert
On Thursday 03 May 2012 02:10:04 you wrote:
> On 2012.5.1 3:29 PM, Philipp K. Janert wrote:
> > However, when working through a files with a few
> > tens of millions of records, DateTime turns into a
> > REAL drag on performance.
> > 
> > Is this expected behavior? And are there access
> > patterns that I can use to mitigate this effect?
> > (I tried to supply a time_zone explicitly, but that
> > does not seem to improve things significantly.)
> 
> Unfortunately due to the way DateTime is architected it does a lot of
> precalculation upon object instantiation which is usually not used.  So
> yes, it is expected in that sense.

Ok.

> 
> If all you need is date objects with a sensible interface, try
> DateTimeX::Lite.  It claims to replicate a good chunk of the DateTime
> interface in a fraction of the memory.

I'll check it out, thanks.

> 
> Given how much time it takes to make a DateTime object, and your scale of
> tens of millions of records, you could cache DateTime objects for each
> timestamp and use clone() to get a new instance.

I considered that, but in reality, most of my timestamps
are actually different. (There are about 30M seconds in
a year, so I won't have much duplication, looking at 10-50M
records spread over a year...)

> 
> sub get_datetime {
> my $timestamp = shift;
> 
> state $cache = {};
> 
> if( defined $cache->{$timestamp} ) {
> return $cache->{$timestamp}->clone;
> }
> else {
> $cache->{$timestamp} =
> make_datetime_from_timestamp($timestamp); return $cache->{$timestamp};
> }
> }