Re: Iterating over very large queryset
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've had pretty good results with SQLAlchemy on large datasets, that might be a painless way to solve the problem.. Ben Jarek Zgoda wrote: > Jacob Kaplan-Moss napisał(a): > Can you share any hints on how to reduce the memory usage in such situation? The underlying database structure is rather complicated and I would like to not do all queries manually. >>> At this level -- hundreds of thousands of objects per query -- I doubt >>> that any ORM solution is going to deliver decent performance. >> That may be true, but in this case it's actually a bug in Django: >> under some (many?) circumstances QuerySets load the entire result set >> into memory even when used with an iterator. The good news is that >> this has been fixed; the bad news is that the fix is on the >> queryset-refactor branch, which likely won't merge to trunk for at >> least a few more weeks, if not longer. Also note that values() won't >> really help you, either: the overhead for a model instance isn't all >> that big; the problem you're running into is simply the shear number >> of results. >> >> In your situation, I'd do one of two things: either switch to the qsrf >> branch if you're a living-on-the-edge kind of guy, or else look into >> using an ObjectPaginator to churn through results in chunks. > > Ah, thanks. The application is in production state, so I cann't use any > other Django version we have installed on the machines. I'll try with > ObjectPaginator, leaving raw SQL as last resort. Thank you all for the > hints. > -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHoYwWLxxSB2Q99Z0RArl/AJ9zVgBoSh1vB1WOpNW/GUX2LtB2UwCbB1w5 V2+Q2xpUuQ50fAVUMHf6Uh8= =Z6Dq -END PGP SIGNATURE- --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Iterating over very large queryset
Jacob Kaplan-Moss napisał(a): >>> Can you share any hints on how to reduce the memory usage in such >>> situation? The underlying database structure is rather complicated and I >>> would like to not do all queries manually. >> At this level -- hundreds of thousands of objects per query -- I doubt >> that any ORM solution is going to deliver decent performance. > > That may be true, but in this case it's actually a bug in Django: > under some (many?) circumstances QuerySets load the entire result set > into memory even when used with an iterator. The good news is that > this has been fixed; the bad news is that the fix is on the > queryset-refactor branch, which likely won't merge to trunk for at > least a few more weeks, if not longer. Also note that values() won't > really help you, either: the overhead for a model instance isn't all > that big; the problem you're running into is simply the shear number > of results. > > In your situation, I'd do one of two things: either switch to the qsrf > branch if you're a living-on-the-edge kind of guy, or else look into > using an ObjectPaginator to churn through results in chunks. Ah, thanks. The application is in production state, so I cann't use any other Django version we have installed on the machines. I'll try with ObjectPaginator, leaving raw SQL as last resort. Thank you all for the hints. -- Jarek Zgoda Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101 "We read Knuth so you don't have to." (Tim Peters) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Iterating over very large queryset
On 1/30/08, James Bennett <[EMAIL PROTECTED]> wrote: > On Jan 30, 2008 10:21 AM, Jarek Zgoda <[EMAIL PROTECTED]> wrote: > > Can you share any hints on how to reduce the memory usage in such > > situation? The underlying database structure is rather complicated and I > > would like to not do all queries manually. > > At this level -- hundreds of thousands of objects per query -- I doubt > that any ORM solution is going to deliver decent performance. That may be true, but in this case it's actually a bug in Django: under some (many?) circumstances QuerySets load the entire result set into memory even when used with an iterator. The good news is that this has been fixed; the bad news is that the fix is on the queryset-refactor branch, which likely won't merge to trunk for at least a few more weeks, if not longer. Also note that values() won't really help you, either: the overhead for a model instance isn't all that big; the problem you're running into is simply the shear number of results. In your situation, I'd do one of two things: either switch to the qsrf branch if you're a living-on-the-edge kind of guy, or else look into using an ObjectPaginator to churn through results in chunks. Jacob --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Iterating over very large queryset
On Jan 30, 2008 10:21 AM, Jarek Zgoda <[EMAIL PROTECTED]> wrote: > Can you share any hints on how to reduce the memory usage in such > situation? The underlying database structure is rather complicated and I > would like to not do all queries manually. At this level -- hundreds of thousands of objects per query -- I doubt that any ORM solution is going to deliver decent performance. -- "Bureaucrat Conrad, you are technically correct -- the best kind of correct." --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Iterating over very large queryset
Hello, I'd like to get some hints on memory usage by python/django program. I have standalone script that populates solr index with django application data. The queryset I have to process has over 16 objects. No matter if I use iterator() or not, the script eats more and more memory, finally causing the whole system to crawl due to extensive swapping. I tried to slice the queryset in batches of 2000 objects, explicitly call "del" on retrieved objects, assigning None to the names I use... And all that does not help, after retrieving around 13 objects the whole system is unusable. It looks like the memory that has to be freed is never reused. The DEBUG setting seems to have no impact on the overall memory usage. Using values() is a no-way because I have to follow few foreign keys. Can you share any hints on how to reduce the memory usage in such situation? The underlying database structure is rather complicated and I would like to not do all queries manually. -- Jarek Zgoda Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101 "We read Knuth so you don't have to." (Tim Peters) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---