On 15 déc, 02:44, Tracy Reed <[email protected]> wrote:
> I have code which looks basically like this:
>
> now = datetime.today()
> beginning = datetime.fromtimestamp(0)
> end = now - timedelta(days=settings.DAYSTOKEEP)
>
> def purgedb():
> """Delete archivedEmail objects from the beginning of time until
> daystokeep days in the past."""
> queryset = archivedEmail.objects.all()
> purgeset = queryset.filter(received__range=(beginning, end))
> for email in purgeset:
> print email
> try:
> os.unlink(settings.REAVER_CACHE+"texts/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
> except OSError:
> pass
> purgeset.delete()
>
> if __name__ == '__main__':
> purgedb()
>
(snip)
> But when purgedb runs it deletes emails 100 at a time (which takes
> forever) and after running for a couple of hours uses a gig and a half
> of RAM. If I let it continue after a number of hours it runs the
> machine out of RAM/swap.
looks like settings.DEBUG=True to me.
> Am I doing something which is not idiomatic or misusing the ORM
> somehow? My understanding is that it should be lazy so using
> objects.all() on queryset and then narrowing it down with a
> queryset.filter() to make a purgeset should be ok, right?
No problem here as long as you don't do anything that forces
evaluation of the queryset. But this is still redundant - you can as
well build the appropriate queryset immediatly.
> What can I
> do to make this run in reasonable time/memory?
Others already commented on checking whether you have settings.DEBUG
set to True - the usual suspects when it comes to RAM issues with
django's ORM.
wrt/ the other mentioned problem - building whole model instances for
each row - you can obviously save a lot of work here by using a
value_list queryset - tuples are very cheap.
Oh, and yes: I/O and filesystem operations are not free neither. This
doesn't solve your pb with the script eating all the RAM, but surely
impacts the overall performances.
Now for something different - here are a couple other python
optimisation tricks:
> for email in purgeset:
> print email
Remove this. I/O are not for free. Really.
> try:
> os.unlink(settings.REAVER_CACHE+"texts/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
> os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
> except OSError:
> pass
Move all redundant attribute lookup (os.unlink and
settings.REAVER_CACHE) and string concatenations out of this loop.
def purgedb():
"""Delete archivedEmail objects from the beginning of time until
daystokeep days in the past.
"""
text_cache = settings.REAVER_CACHE + "texts/%s"
prob_good_cache = settings.REAVER_CACHE+"prob_good/%s"
prob_spam_cache = settings.REAVER_CACHE+"prob_spam/%s"
unlink = os.unlink
# no reason to put this outside the function.
now = datetime.today()
beginning = datetime.fromtimestamp(0)
end = now - timedelta(days=settings.DAYSTOKEEP)
qs = archivedEmail.objects.filter(received__range=(beginning, end))
for row in qs.value_list(cacheID):
cacheID = row[0]
try:
unlink(text_cache % cacheID)
unlink(prob_good_cache % cacheID)
unlink(prob_spam_cache % cacheID)
except OSError:
pass
qs.delete()
Oh and yes, one last point : how do you run this script exactly ?
HTH
--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/django-users?hl=en.