Re: Offset/limit and looping over a large dataset

2013-03-27 Thread Stefano Crosta
Thanks Anssi for both your replies. These db interactions definitely are 
really beyond the scope of my knowledge, unluckily.

I did try and implement an iterator such as the one that you describe in 
your other message, based on index column filtering - what I dislike is 
that it's an additional function getting a qs parameter rather then 
naturally sitting in the queryset and (thus?) I did not manage to make it 
very generic...  Eg. some querysets don't even contain the key column 
anymore.. (the outlines of these are in my SO question but ). I'm sure it 
can be done better, especially if it's part of the queryset rather then an 
external function that gets an evaluated qs. I still manage to use it both 
for pagination and for very complex functions that I can't run in the DB.
As you say, it would still be nice to see this kind of generic iterator in 
core, even known the limitations (heck, the standard iterator just exploses 
the process by memory usage, so it's not that much better :) ).

WITH HOLD does not look like a safe solution if you have to explicitly 
close it (would need a separate thread with timeout maybe?), but I also 
read some discussions about creating a connection pool and persistent 
connections, so maybe at some point all these issues will find a common 
base for a solution..

As usual, I'll eagerly keep reading these posts and try to play with code 
on my own hoping to be able to contribute at some point..

Stefano

On Wednesday, March 27, 2013 10:36:02 AM UTC+1, Anssi Kääriäinen wrote:
>
> On 03/27/2013 10:53 AM, Stefano Crosta wrote: 
> > Thanks Aymeric, 
> > that's true, and I should have put that link too. 
> > sadly "akaariai" who's the only one who seemed to understand a bit 
> > about it (and is a core dev) did not seem interested in bringing this 
> > any forward, so I thought I'd try to raise some interest again! 
> The problem is that server side cursors will need dedicated API, and 
> there will be database specific problems when using server side cursors. 
> On SQLite changes done to rows are visible in the results, on other 
> databases not. On PostgreSQL you will need to use WITH HOLD cursors if 
> you want to use server side cursors outside transactions (that is, in 
> autocommit mode normal server side cursors do not work). When using WITH 
> HOLD cursors you must close the cursor explicitly or you will have 
> cursor leak... And on MySQL WITH HOLD cursors aren't available at all, 
> so you must be in transaction to use server side cursors. 
>
> Oracle seems to be the only core DB that will work without problems. In 
> fact, using .iterator() on Oracle already works without memory problems. 
>
> Maybe the dedicated API could be adding two new keywords to .iterator(): 
> server_side_cursor, and with_hold. with_hold=True implies 
> server_side_cursor=True. If you use with_hold=True you are responsible 
> for closing the iterator, too. The behaviour of server_side_cursor and 
> with_hold is database specific - it will be impossible to abstract the 
> differences away. 
>
>   - Anssi 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Offset/limit and looping over a large dataset

2013-03-27 Thread Anssi Kääriäinen

On 03/27/2013 10:53 AM, Stefano Crosta wrote:

Thanks Aymeric,
that's true, and I should have put that link too.
sadly "akaariai" who's the only one who seemed to understand a bit 
about it (and is a core dev) did not seem interested in bringing this 
any forward, so I thought I'd try to raise some interest again!
The problem is that server side cursors will need dedicated API, and 
there will be database specific problems when using server side cursors. 
On SQLite changes done to rows are visible in the results, on other 
databases not. On PostgreSQL you will need to use WITH HOLD cursors if 
you want to use server side cursors outside transactions (that is, in 
autocommit mode normal server side cursors do not work). When using WITH 
HOLD cursors you must close the cursor explicitly or you will have 
cursor leak... And on MySQL WITH HOLD cursors aren't available at all, 
so you must be in transaction to use server side cursors.


Oracle seems to be the only core DB that will work without problems. In 
fact, using .iterator() on Oracle already works without memory problems.


Maybe the dedicated API could be adding two new keywords to .iterator(): 
server_side_cursor, and with_hold. with_hold=True implies 
server_side_cursor=True. If you use with_hold=True you are responsible 
for closing the iterator, too. The behaviour of server_side_cursor and 
with_hold is database specific - it will be impossible to abstract the 
differences away.


 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Offset/limit and looping over a large dataset

2013-03-27 Thread Stefano Crosta
Thanks Aymeric, 
that's true, and I should have put that link too.
sadly "akaariai" who's the only one who seemed to understand a bit about it 
(and is a core dev) did not seem interested in bringing this any forward, 
so I thought I'd try to raise some interest again!

On Wednesday, March 27, 2013 9:21:51 AM UTC+1, Aymeric Augustin wrote:
>
> On 27 mars 2013, at 09:10, Stefano Crosta  
> wrote: 
>
> > it would be great to see server-side cursors and/or a smarter way to 
> iterate make it into core. 
>
> FYI this topic is tracked here: 
> https://code.djangoproject.com/ticket/16614 
>
> -- 
> Aymeric. 
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Offset/limit and looping over a large dataset

2013-03-27 Thread Anssi Kääriäinen
On 27 maalis, 10:10, Stefano Crosta  wrote:
> On Tuesday, March 26, 2013 10:47:34 PM UTC+1, Petite Abeille wrote:
>
> > On Mar 26, 2013, at 10:03 PM, Alex Gaynor 
> > wrote:
>
> > > For what it's worth, SQL2011 does define OFFSET, finally.
>
> > Perhaps worthwhile mentioning as well :
>
> > "Do not try to implement a scrolling window using LIMIT and OFFSET. Doing
> > so will become sluggish as the user scrolls down toward the bottom of the
> > list."
> > -- Scrolling Cursor, What Not To Do
> >http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor
>
> > Just saying...
>
> This is already getting slightly off-topic wrt the original discussion on
> EXISTS/IN, so I allowed myself to change the subject.
>
> But this is another important matter I think, and I've already seen mention
> of server-side cursors in these threads.

Server side cursor would be useful in many situations. However, it is
hard to implement server side cursors in a way the ORM could use them
automatically. See https://code.djangoproject.com/ticket/16614 for
details.

The usual problem with LIMIT/OFFSET is pagination and going deep into
pages list. Server side cursors would only help if you could somehow
keep the same cursor open between requests for different pages.

The solution to large resultset pagination is to sort by some index
you have, and then when you go to next page, you do this:
 
SomeModel.objects.filter(indexedcol__gt=prev_pages_last_obj.indexedcol).order_by('indexedcol')
[0:PAGE_SIZE]
This operation can use the index and is a very efficient way to
retrieve pages. The downside is that this only works for unique
indexed columns (or "unique enough" that duplicates do not matter in
practice). Also, you can't easily give page numbers for pagination,
you can only have links "first, previous, next, last". Still, if you
have a lot of objects then normal pagination of "count all, give links
by page numbers" simply does not scale. The count(*) itself can be too
expensive for large resultsets.

Having in-built support for this type of pagination is something I
would like to see in Django.

If you want to implement an automatically refreshed scrolling window
using AJAX and want the ability to sort by any column and do so
efficiently, then you will need to use server side cursors and have
some way to get the same cursor back for different AJAX requests. In
practice you would need some sort of connection pool where you could
store a connection for reuse, and ask the same connection back when
next request arrives. This seems complex to implement correctly and
doesn't seem like something that belongs into Django...

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Offset/limit and looping over a large dataset

2013-03-27 Thread Aymeric Augustin
On 27 mars 2013, at 09:10, Stefano Crosta  wrote:

> it would be great to see server-side cursors and/or a smarter way to iterate 
> make it into core. 

FYI this topic is tracked here: https://code.djangoproject.com/ticket/16614

-- 
Aymeric.



-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.