On Mon, 2009-12-14 at 17:44 -0800, Tracy Reed wrote:
> I have code which looks basically like this:
>
> now = datetime.today()
> beginning = datetime.fromtimestamp(0)
> end = now - timedelta(days=settings.DAYSTOKEEP)
>
> def purgedb():
> """Delete archivedEmail objects from the beginning of time until
> daystokeep days in the past."""
> queryset = archivedEmail.objects.all()
> purgeset = queryset.filter(received__range=(beginning, end))
> for email in purgeset:
> print email
> try:
> os.unlink(settings.REAVER_CACHE+"texts/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
> except OSError:
> pass
> purgeset.delete()
>
> if __name__ == '__main__':
> purgedb()
>
> The idea is that we are stuffing a bunch of emails in a database for
> customer service purposes. I want to clear out anything older than
> DAYSTOKEEP. The model looks like this:
>
> class archivedEmail(models.Model):
> subject = models.CharField(blank=True, max_length=512, null=True)
> toAddress = models.CharField(blank=True, max_length=128, db_index=True)
> fromAddress = models.CharField(blank=True, max_length=128, db_index=True)
> date = models.DateTimeField()
> received = models.DateTimeField(db_index=True)
> crmScore = models.FloatField()
> spamStatus = models.CharField(max_length=6, choices=spamStatusChoices,
> db_index=True)
> cacheHost = models.CharField(max_length=24)
> cacheID = models.CharField(max_length=31, primary_key=True)
>
> class Meta:
> ordering = ('-received',)
>
> But when purgedb runs it deletes emails 100 at a time (which takes
> forever) and after running for a couple of hours uses a gig and a half
> of RAM. If I let it continue after a number of hours it runs the
> machine out of RAM/swap.
>
> Am I doing something which is not idiomatic or misusing the ORM
> somehow? My understanding is that it should be lazy so using
> objects.all() on queryset and then narrowing it down with a
> queryset.filter() to make a purgeset should be ok, right? What can I
> do to make this run in reasonable time/memory?
>
> PS: I used to have ordering set to -date in the class Meta but that
> caused the db to always put an ORDER BY date on the select query which
> was unnecessary in this case causing it to take ages sorting a couple
> million rows since there is no index on date (nor did there need to
> be, so I thought, since we never select on it). Changing it to
> received makes no difference to my app but avoids creating another
> index. Django's is the first ORM I have ever used and these sneaky
> performance issues are making me wonder...
>
If you have DEBUG=True setting Django also records _every_ SQL query
made to database and depending on a case, it might use quite lot of
memory.
--
Jani Tiainen
--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en.