Re: Iterating over very large queryset

2008-01-31 Thread Ben Ford

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've had pretty good results with SQLAlchemy on large datasets, that
might be a painless way to solve the problem..
Ben

Jarek Zgoda wrote:
> Jacob Kaplan-Moss napisał(a):
> 
 Can you share any hints on how to reduce the memory usage in such
 situation? The underlying database structure is rather complicated and I
 would like to not do all queries manually.
>>> At this level -- hundreds of thousands of objects per query -- I doubt
>>> that any ORM solution is going to deliver decent performance.
>> That may be true, but in this case it's actually a bug in Django:
>> under some (many?) circumstances QuerySets load the entire result set
>> into memory even when used with an iterator. The good news is that
>> this has been fixed; the bad news is that the fix is on the
>> queryset-refactor branch, which likely won't merge to trunk for at
>> least a few more weeks, if not longer. Also note that values() won't
>> really help you, either: the overhead for a model instance isn't all
>> that big; the problem you're running into is simply the shear number
>> of results.
>>
>> In your situation, I'd do one of two things: either switch to the qsrf
>> branch if you're a living-on-the-edge kind of guy, or else look into
>> using an ObjectPaginator to churn through results in chunks.
> 
> Ah, thanks. The application is in production state, so I cann't use any
> other Django version we have installed on the machines. I'll try with
> ObjectPaginator, leaving raw SQL as last resort. Thank you all for the
> hints.
> 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHoYwWLxxSB2Q99Z0RArl/AJ9zVgBoSh1vB1WOpNW/GUX2LtB2UwCbB1w5
V2+Q2xpUuQ50fAVUMHf6Uh8=
=Z6Dq
-END PGP SIGNATURE-

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Iterating over very large queryset

2008-01-31 Thread Jarek Zgoda

Jacob Kaplan-Moss napisał(a):

>>> Can you share any hints on how to reduce the memory usage in such
>>> situation? The underlying database structure is rather complicated and I
>>> would like to not do all queries manually.
>> At this level -- hundreds of thousands of objects per query -- I doubt
>> that any ORM solution is going to deliver decent performance.
> 
> That may be true, but in this case it's actually a bug in Django:
> under some (many?) circumstances QuerySets load the entire result set
> into memory even when used with an iterator. The good news is that
> this has been fixed; the bad news is that the fix is on the
> queryset-refactor branch, which likely won't merge to trunk for at
> least a few more weeks, if not longer. Also note that values() won't
> really help you, either: the overhead for a model instance isn't all
> that big; the problem you're running into is simply the shear number
> of results.
> 
> In your situation, I'd do one of two things: either switch to the qsrf
> branch if you're a living-on-the-edge kind of guy, or else look into
> using an ObjectPaginator to churn through results in chunks.

Ah, thanks. The application is in production state, so I cann't use any
other Django version we have installed on the machines. I'll try with
ObjectPaginator, leaving raw SQL as last resort. Thank you all for the
hints.

-- 
Jarek Zgoda
Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Iterating over very large queryset

2008-01-30 Thread Jacob Kaplan-Moss

On 1/30/08, James Bennett <[EMAIL PROTECTED]> wrote:
> On Jan 30, 2008 10:21 AM, Jarek Zgoda <[EMAIL PROTECTED]> wrote:
> > Can you share any hints on how to reduce the memory usage in such
> > situation? The underlying database structure is rather complicated and I
> > would like to not do all queries manually.
>
> At this level -- hundreds of thousands of objects per query -- I doubt
> that any ORM solution is going to deliver decent performance.

That may be true, but in this case it's actually a bug in Django:
under some (many?) circumstances QuerySets load the entire result set
into memory even when used with an iterator. The good news is that
this has been fixed; the bad news is that the fix is on the
queryset-refactor branch, which likely won't merge to trunk for at
least a few more weeks, if not longer. Also note that values() won't
really help you, either: the overhead for a model instance isn't all
that big; the problem you're running into is simply the shear number
of results.

In your situation, I'd do one of two things: either switch to the qsrf
branch if you're a living-on-the-edge kind of guy, or else look into
using an ObjectPaginator to churn through results in chunks.

Jacob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Iterating over very large queryset

2008-01-30 Thread James Bennett

On Jan 30, 2008 10:21 AM, Jarek Zgoda <[EMAIL PROTECTED]> wrote:
> Can you share any hints on how to reduce the memory usage in such
> situation? The underlying database structure is rather complicated and I
> would like to not do all queries manually.

At this level -- hundreds of thousands of objects per query -- I doubt
that any ORM solution is going to deliver decent performance.


-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Iterating over very large queryset

2008-01-30 Thread Jarek Zgoda

Hello, I'd like to get some hints on memory usage by python/django program.

I have standalone script that populates solr index with django
application data. The queryset I have to process has over 16
objects. No matter if I use iterator() or not, the script eats more and
more memory, finally causing the whole system to crawl due to extensive
swapping. I tried to slice the queryset in batches of 2000 objects,
explicitly call "del" on retrieved objects, assigning None to the names
I use... And all that does not help, after retrieving around 13
objects the whole system is unusable. It looks like the memory that has
to be freed is never reused. The DEBUG setting seems to have no impact
on the overall memory usage. Using values() is a no-way because I have
to follow few foreign keys.

Can you share any hints on how to reduce the memory usage in such
situation? The underlying database structure is rather complicated and I
would like to not do all queries manually.

-- 
Jarek Zgoda
Skype: jzgoda | GTalk: [EMAIL PROTECTED] | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---