On Mon, 2009-12-14 at 17:44 -0800, Tracy Reed wrote:
> I have code which looks basically like this:
> 
> now        = datetime.today()
> beginning  = datetime.fromtimestamp(0)
> end        = now - timedelta(days=settings.DAYSTOKEEP)
> 
> def purgedb():
>     """Delete archivedEmail objects from the beginning of time until
>     daystokeep days in the past."""
>     queryset   = archivedEmail.objects.all()
>     purgeset   = queryset.filter(received__range=(beginning, end))
>     for email in purgeset:
>         print email
>         try:
>             os.unlink(settings.REAVER_CACHE+"texts/%s"     % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
>         except OSError:
>             pass
>     purgeset.delete()
> 
> if __name__ == '__main__':
>     purgedb()
> 
> The idea is that we are stuffing a bunch of emails in a database for
> customer service purposes. I want to clear out anything older than
> DAYSTOKEEP. The model looks like this:
> 
> class archivedEmail(models.Model):
>     subject     = models.CharField(blank=True, max_length=512, null=True)
>     toAddress   = models.CharField(blank=True, max_length=128, db_index=True)
>     fromAddress = models.CharField(blank=True, max_length=128, db_index=True)
>     date        = models.DateTimeField()
>     received    = models.DateTimeField(db_index=True)
>     crmScore    = models.FloatField()
>     spamStatus  = models.CharField(max_length=6, choices=spamStatusChoices, 
> db_index=True)
>     cacheHost   = models.CharField(max_length=24)
>     cacheID     = models.CharField(max_length=31, primary_key=True)
> 
>     class Meta:
>         ordering = ('-received',)
> 
> But when purgedb runs it deletes emails 100 at a time (which takes
> forever) and after running for a couple of hours uses a gig and a half
> of RAM. If I let it continue after a number of hours it runs the
> machine out of RAM/swap.
> 
> Am I doing something which is not idiomatic or misusing the ORM
> somehow? My understanding is that it should be lazy so using
> objects.all() on queryset and then narrowing it down with a
> queryset.filter() to make a purgeset should be ok, right? What can I
> do to make this run in reasonable time/memory?
> 
> PS: I used to have ordering set to -date in the class Meta but that
> caused the db to always put an ORDER BY date on the select query which
> was unnecessary in this case causing it to take ages sorting a couple
> million rows since there is no index on date (nor did there need to
> be, so I thought, since we never select on it). Changing it to
> received makes no difference to my app but avoids creating another
> index. Django's is the first ORM I have ever used and these sneaky
> performance issues are making me wonder...
> 

If you have DEBUG=True setting Django also records _every_ SQL query
made to database and depending on a case, it might use quite lot of
memory.

-- 

Jani Tiainen


--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


Reply via email to