On Mon, 2009-12-14 at 17:44 -0800, Tracy Reed wrote: > I have code which looks basically like this: > > now = datetime.today() > beginning = datetime.fromtimestamp(0) > end = now - timedelta(days=settings.DAYSTOKEEP) > > def purgedb(): > """Delete archivedEmail objects from the beginning of time until > daystokeep days in the past.""" > queryset = archivedEmail.objects.all() > purgeset = queryset.filter(received__range=(beginning, end)) > for email in purgeset: > print email > try: > os.unlink(settings.REAVER_CACHE+"texts/%s" % email.cacheID) > os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID) > os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID) > except OSError: > pass > purgeset.delete() > > if __name__ == '__main__': > purgedb() > > The idea is that we are stuffing a bunch of emails in a database for > customer service purposes. I want to clear out anything older than > DAYSTOKEEP. The model looks like this: > > class archivedEmail(models.Model): > subject = models.CharField(blank=True, max_length=512, null=True) > toAddress = models.CharField(blank=True, max_length=128, db_index=True) > fromAddress = models.CharField(blank=True, max_length=128, db_index=True) > date = models.DateTimeField() > received = models.DateTimeField(db_index=True) > crmScore = models.FloatField() > spamStatus = models.CharField(max_length=6, choices=spamStatusChoices, > db_index=True) > cacheHost = models.CharField(max_length=24) > cacheID = models.CharField(max_length=31, primary_key=True) > > class Meta: > ordering = ('-received',) > > But when purgedb runs it deletes emails 100 at a time (which takes > forever) and after running for a couple of hours uses a gig and a half > of RAM. If I let it continue after a number of hours it runs the > machine out of RAM/swap. > > Am I doing something which is not idiomatic or misusing the ORM > somehow? My understanding is that it should be lazy so using > objects.all() on queryset and then narrowing it down with a > queryset.filter() to make a purgeset should be ok, right? What can I > do to make this run in reasonable time/memory? > > PS: I used to have ordering set to -date in the class Meta but that > caused the db to always put an ORDER BY date on the select query which > was unnecessary in this case causing it to take ages sorting a couple > million rows since there is no index on date (nor did there need to > be, so I thought, since we never select on it). Changing it to > received makes no difference to my app but avoids creating another > index. Django's is the first ORM I have ever used and these sneaky > performance issues are making me wonder... >
If you have DEBUG=True setting Django also records _every_ SQL query made to database and depending on a case, it might use quite lot of memory. -- Jani Tiainen -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.