Re: Offset/limit and looping over a large dataset
Thanks Anssi for both your replies. These db interactions definitely are really beyond the scope of my knowledge, unluckily. I did try and implement an iterator such as the one that you describe in your other message, based on index column filtering - what I dislike is that it's an additional function getting a qs parameter rather then naturally sitting in the queryset and (thus?) I did not manage to make it very generic... Eg. some querysets don't even contain the key column anymore.. (the outlines of these are in my SO question but ). I'm sure it can be done better, especially if it's part of the queryset rather then an external function that gets an evaluated qs. I still manage to use it both for pagination and for very complex functions that I can't run in the DB. As you say, it would still be nice to see this kind of generic iterator in core, even known the limitations (heck, the standard iterator just exploses the process by memory usage, so it's not that much better :) ). WITH HOLD does not look like a safe solution if you have to explicitly close it (would need a separate thread with timeout maybe?), but I also read some discussions about creating a connection pool and persistent connections, so maybe at some point all these issues will find a common base for a solution.. As usual, I'll eagerly keep reading these posts and try to play with code on my own hoping to be able to contribute at some point.. Stefano On Wednesday, March 27, 2013 10:36:02 AM UTC+1, Anssi Kääriäinen wrote: > > On 03/27/2013 10:53 AM, Stefano Crosta wrote: > > Thanks Aymeric, > > that's true, and I should have put that link too. > > sadly "akaariai" who's the only one who seemed to understand a bit > > about it (and is a core dev) did not seem interested in bringing this > > any forward, so I thought I'd try to raise some interest again! > The problem is that server side cursors will need dedicated API, and > there will be database specific problems when using server side cursors. > On SQLite changes done to rows are visible in the results, on other > databases not. On PostgreSQL you will need to use WITH HOLD cursors if > you want to use server side cursors outside transactions (that is, in > autocommit mode normal server side cursors do not work). When using WITH > HOLD cursors you must close the cursor explicitly or you will have > cursor leak... And on MySQL WITH HOLD cursors aren't available at all, > so you must be in transaction to use server side cursors. > > Oracle seems to be the only core DB that will work without problems. In > fact, using .iterator() on Oracle already works without memory problems. > > Maybe the dedicated API could be adding two new keywords to .iterator(): > server_side_cursor, and with_hold. with_hold=True implies > server_side_cursor=True. If you use with_hold=True you are responsible > for closing the iterator, too. The behaviour of server_side_cursor and > with_hold is database specific - it will be impossible to abstract the > differences away. > > - Anssi > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: Offset/limit and looping over a large dataset
On 03/27/2013 10:53 AM, Stefano Crosta wrote: Thanks Aymeric, that's true, and I should have put that link too. sadly "akaariai" who's the only one who seemed to understand a bit about it (and is a core dev) did not seem interested in bringing this any forward, so I thought I'd try to raise some interest again! The problem is that server side cursors will need dedicated API, and there will be database specific problems when using server side cursors. On SQLite changes done to rows are visible in the results, on other databases not. On PostgreSQL you will need to use WITH HOLD cursors if you want to use server side cursors outside transactions (that is, in autocommit mode normal server side cursors do not work). When using WITH HOLD cursors you must close the cursor explicitly or you will have cursor leak... And on MySQL WITH HOLD cursors aren't available at all, so you must be in transaction to use server side cursors. Oracle seems to be the only core DB that will work without problems. In fact, using .iterator() on Oracle already works without memory problems. Maybe the dedicated API could be adding two new keywords to .iterator(): server_side_cursor, and with_hold. with_hold=True implies server_side_cursor=True. If you use with_hold=True you are responsible for closing the iterator, too. The behaviour of server_side_cursor and with_hold is database specific - it will be impossible to abstract the differences away. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: Offset/limit and looping over a large dataset
Thanks Aymeric, that's true, and I should have put that link too. sadly "akaariai" who's the only one who seemed to understand a bit about it (and is a core dev) did not seem interested in bringing this any forward, so I thought I'd try to raise some interest again! On Wednesday, March 27, 2013 9:21:51 AM UTC+1, Aymeric Augustin wrote: > > On 27 mars 2013, at 09:10, Stefano Crosta> wrote: > > > it would be great to see server-side cursors and/or a smarter way to > iterate make it into core. > > FYI this topic is tracked here: > https://code.djangoproject.com/ticket/16614 > > -- > Aymeric. > > > > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: Offset/limit and looping over a large dataset
On 27 maalis, 10:10, Stefano Crostawrote: > On Tuesday, March 26, 2013 10:47:34 PM UTC+1, Petite Abeille wrote: > > > On Mar 26, 2013, at 10:03 PM, Alex Gaynor > > wrote: > > > > For what it's worth, SQL2011 does define OFFSET, finally. > > > Perhaps worthwhile mentioning as well : > > > "Do not try to implement a scrolling window using LIMIT and OFFSET. Doing > > so will become sluggish as the user scrolls down toward the bottom of the > > list." > > -- Scrolling Cursor, What Not To Do > >http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor > > > Just saying... > > This is already getting slightly off-topic wrt the original discussion on > EXISTS/IN, so I allowed myself to change the subject. > > But this is another important matter I think, and I've already seen mention > of server-side cursors in these threads. Server side cursor would be useful in many situations. However, it is hard to implement server side cursors in a way the ORM could use them automatically. See https://code.djangoproject.com/ticket/16614 for details. The usual problem with LIMIT/OFFSET is pagination and going deep into pages list. Server side cursors would only help if you could somehow keep the same cursor open between requests for different pages. The solution to large resultset pagination is to sort by some index you have, and then when you go to next page, you do this: SomeModel.objects.filter(indexedcol__gt=prev_pages_last_obj.indexedcol).order_by('indexedcol') [0:PAGE_SIZE] This operation can use the index and is a very efficient way to retrieve pages. The downside is that this only works for unique indexed columns (or "unique enough" that duplicates do not matter in practice). Also, you can't easily give page numbers for pagination, you can only have links "first, previous, next, last". Still, if you have a lot of objects then normal pagination of "count all, give links by page numbers" simply does not scale. The count(*) itself can be too expensive for large resultsets. Having in-built support for this type of pagination is something I would like to see in Django. If you want to implement an automatically refreshed scrolling window using AJAX and want the ability to sort by any column and do so efficiently, then you will need to use server side cursors and have some way to get the same cursor back for different AJAX requests. In practice you would need some sort of connection pool where you could store a connection for reuse, and ask the same connection back when next request arrives. This seems complex to implement correctly and doesn't seem like something that belongs into Django... - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: Offset/limit and looping over a large dataset
On 27 mars 2013, at 09:10, Stefano Crostawrote: > it would be great to see server-side cursors and/or a smarter way to iterate > make it into core. FYI this topic is tracked here: https://code.djangoproject.com/ticket/16614 -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers?hl=en. For more options, visit https://groups.google.com/groups/opt_out.