DateTime performance

2012-05-03 Thread Philipp K. Janert

Question:

When using DateTime for a large number of
instances, it becomes a serious performance
drag. 

A typical application for me involves things like
log files: I use DateTime to translate the timestamps 
in these files into a canonical format, and then get 
information such as day-of-week or time-of-day 
from DateTime. 

However, when working through a files with a few 
tens of millions of records, DateTime turns into a 
REAL drag on performance.

Is this expected behavior? And are there access
patterns that I can use to mitigate this effect? 
(I tried to supply a time_zone explicitly, but that
does not seem to improve things significantly.)

Best,

Ph.



Re: DateTime performance

2012-05-03 Thread Michael G Schwern
On 2012.5.1 3:29 PM, Philipp K. Janert wrote:
 However, when working through a files with a few 
 tens of millions of records, DateTime turns into a 
 REAL drag on performance.
 
 Is this expected behavior? And are there access
 patterns that I can use to mitigate this effect? 
 (I tried to supply a time_zone explicitly, but that
 does not seem to improve things significantly.)

Unfortunately due to the way DateTime is architected it does a lot of
precalculation upon object instantiation which is usually not used.  So yes,
it is expected in that sense.

If all you need is date objects with a sensible interface, try
DateTimeX::Lite.  It claims to replicate a good chunk of the DateTime
interface in a fraction of the memory.

Given how much time it takes to make a DateTime object, and your scale of tens
of millions of records, you could cache DateTime objects for each timestamp
and use clone() to get a new instance.

sub get_datetime {
my $timestamp = shift;

state $cache = {};

if( defined $cache-{$timestamp} ) {
return $cache-{$timestamp}-clone;
}
else {
$cache-{$timestamp} = make_datetime_from_timestamp($timestamp);
return $cache-{$timestamp};
}
}


-- 
100. Claymore mines are not filled with yummy candy, and it is wrong
 to tell new soldiers that they are.
-- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army
   http://skippyslist.com/list/


RE: DateTime performance

2012-05-03 Thread Andrew O'Brien
 From: Philipp K. Janert [mailto:jan...@ieee.org]
 Sent: Wednesday, 2 May 2012 8:29 AM
 
 Question:
 
 When using DateTime for a large number of
 instances, it becomes a serious performance
 drag.
...
 Is this expected behavior? And are there access
 patterns that I can use to mitigate this effect?
 (I tried to supply a time_zone explicitly, but that
 does not seem to improve things significantly.)

Hi Phillip,

My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in 
to each creation of a DateTime object (via whatever mechanism you're using to 
do that). I have seen a case where we were using time_zone = 'local' in a 
reasonably tight datetime object creation loop and saw significant speed 
increases just by cutting out that chunk of processing.

In hindsight that was a silly thing to do but it became an easy win :-)

I apologise if this is what you meant by supplying a time_zone explicitly in 
your comment above.

I can't recommend using a tool like NYTProf highly enough on a run of your tool 
to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf

Cheers,

Andrew


Re: DateTime performance

2012-05-03 Thread Rick Measham
In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote 
for this sort of case. It only inflates to a full DateTime object when you call 
methods that aren't simple. 

http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm

Caveat: I haven't tested it against any recent DateTime releases. 

Cheers!
Rick Measham


On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote:

 
 Question:
 
 When using DateTime for a large number of
 instances, it becomes a serious performance
 drag. 
 
 A typical application for me involves things like
 log files: I use DateTime to translate the timestamps 
 in these files into a canonical format, and then get 
 information such as day-of-week or time-of-day 
 from DateTime. 
 
 However, when working through a files with a few 
 tens of millions of records, DateTime turns into a 
 REAL drag on performance.
 
 Is this expected behavior? And are there access
 patterns that I can use to mitigate this effect? 
 (I tried to supply a time_zone explicitly, but that
 does not seem to improve things significantly.)
 
 Best,
 
Ph.
 
 -- 
 Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
 content filtering.http://www.mailguard.com.au
 Click here to report this message as spam:
 https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0
 
-- 
Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
content filtering.http://www.mailguard.com.au



Re: DateTime performance

2012-05-03 Thread Ashley Pond V
I love and use DateTime for for 10s of millions of records at once I
would be choosing Date::Calc instead and dealing with any necessary
futzy bits manually.

On Thu, May 3, 2012 at 2:53 AM, Rick Measham r...@measham.id.au wrote:
 In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I 
 wrote for this sort of case. It only inflates to a full DateTime object when 
 you call methods that aren't simple.

 http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm

 Caveat: I haven't tested it against any recent DateTime releases.

 Cheers!
 Rick Measham
 

 On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote:


 Question:

 When using DateTime for a large number of
 instances, it becomes a serious performance
 drag.

 A typical application for me involves things like
 log files: I use DateTime to translate the timestamps
 in these files into a canonical format, and then get
 information such as day-of-week or time-of-day
 from DateTime.

 However, when working through a files with a few
 tens of millions of records, DateTime turns into a
 REAL drag on performance.

 Is this expected behavior? And are there access
 patterns that I can use to mitigate this effect?
 (I tried to supply a time_zone explicitly, but that
 does not seem to improve things significantly.)

 Best,

        Ph.

 --
 Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
 content filtering.http://www.mailguard.com.au
 Click here to report this message as spam:
 https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0

 --
 Message  protected for iSite by MailGuard: e-mail anti-virus, anti-spam and 
 content filtering.http://www.mailguard.com.au