DateTime performance
Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph.
Re: DateTime performance
On 2012.5.1 3:29 PM, Philipp K. Janert wrote: However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Unfortunately due to the way DateTime is architected it does a lot of precalculation upon object instantiation which is usually not used. So yes, it is expected in that sense. If all you need is date objects with a sensible interface, try DateTimeX::Lite. It claims to replicate a good chunk of the DateTime interface in a fraction of the memory. Given how much time it takes to make a DateTime object, and your scale of tens of millions of records, you could cache DateTime objects for each timestamp and use clone() to get a new instance. sub get_datetime { my $timestamp = shift; state $cache = {}; if( defined $cache-{$timestamp} ) { return $cache-{$timestamp}-clone; } else { $cache-{$timestamp} = make_datetime_from_timestamp($timestamp); return $cache-{$timestamp}; } } -- 100. Claymore mines are not filled with yummy candy, and it is wrong to tell new soldiers that they are. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/
RE: DateTime performance
From: Philipp K. Janert [mailto:jan...@ieee.org] Sent: Wednesday, 2 May 2012 8:29 AM Question: When using DateTime for a large number of instances, it becomes a serious performance drag. ... Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Hi Phillip, My #1 tip is to pre-prepare/cache the DateTime::TimeZone object and pass it in to each creation of a DateTime object (via whatever mechanism you're using to do that). I have seen a case where we were using time_zone = 'local' in a reasonably tight datetime object creation loop and saw significant speed increases just by cutting out that chunk of processing. In hindsight that was a silly thing to do but it became an easy win :-) I apologise if this is what you meant by supplying a time_zone explicitly in your comment above. I can't recommend using a tool like NYTProf highly enough on a run of your tool to spot the low hanging fruit. See https://metacpan.org/module/Devel::NYTProf Cheers, Andrew
Re: DateTime performance
In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote for this sort of case. It only inflates to a full DateTime object when you call methods that aren't simple. http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm Caveat: I haven't tested it against any recent DateTime releases. Cheers! Rick Measham On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote: Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph. -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au Click here to report this message as spam: https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
Re: DateTime performance
I love and use DateTime for for 10s of millions of records at once I would be choosing Date::Calc instead and dealing with any necessary futzy bits manually. On Thu, May 3, 2012 at 2:53 AM, Rick Measham r...@measham.id.au wrote: In the spirit of TIMTOWTDI, there's my DateTime::LazyInit module which I wrote for this sort of case. It only inflates to a full DateTime object when you call methods that aren't simple. http://search.cpan.org/~rickm/DateTime-LazyInit-1.0200/lib/DateTime/LazyInit.pm Caveat: I haven't tested it against any recent DateTime releases. Cheers! Rick Measham On 02/05/2012, at 8:29, Philipp K. Janert jan...@ieee.org wrote: Question: When using DateTime for a large number of instances, it becomes a serious performance drag. A typical application for me involves things like log files: I use DateTime to translate the timestamps in these files into a canonical format, and then get information such as day-of-week or time-of-day from DateTime. However, when working through a files with a few tens of millions of records, DateTime turns into a REAL drag on performance. Is this expected behavior? And are there access patterns that I can use to mitigate this effect? (I tried to supply a time_zone explicitly, but that does not seem to improve things significantly.) Best, Ph. -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au Click here to report this message as spam: https://login.mailguard.com.au/report/1EEXMobD68/14EZiTvCo3I3sbAw7UgxdE/0 -- Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au