Suggestions for Paginating a Growing Django Queryset

2014-05-30 Thread Riegie Godwin
Hello everyone!
 
  I've been stuck for about a week trying to figure this out. Been on 
all sorts of blogs and asked questions on stackoverflow in hopes that it 
would lead in the right direction. I would like to know what you think I 
could do.
 
Let's say I have 4 models, a User model, Blog Model, Hashtag and a Comment 
model. Users can create blogs, users can comment on them and blogs can be 
tagged by the blog owner from the comments or the blog description itself. 
If I we're to best describe this, think of Instagram. You can hashtag your 
photo when your uploading it, and afterwards by commenting on it. 
 
Here are the models:
 
#models.py
 
from django.db import models
from django.contrib.auth.models import User
 
class Blog(models.Model):
   user = models.ForeignKey(User)
   description = models.CharField(max_length=140, blank=True)
   blog_text = models.CharField(max_length=5000,  blank=True)
   related_hashtags = models.ManyToManyField('Hashtag', 
related_name = 'tagged_post', null=True, blank=True)
 
 
class Comment(models.Model):
   user = models.ForeignKey(User)
   comment = models.CharField(max_length=140, blank=True)
 
class Hashtag(models.Model):
   user = models.ForeignKey(User)
   comment = models.CharField(max_length=140, blank=True)
   count = models.IntegerField(default=1)
 
Pretty simple. A blog can have multiple hashtags and a hashtag can belong 
to multiple blogs. When a user creates a blog, I check the description 
string for any hashtags using the following function below:
 
 hash_tags = re.findall(r'#(\w+)', blog.description, re.UNICODE)[:30] # 
each blog can have a maximum of 30 hashtags
 if hash_tags:
for hash_tag in hash_tags:
   try:
  oldHashtag = 
Hashtag.objects.get(hashtag = hash_tag)
  oldHashtag.count = 
oldHashtag.count + 1
  oldHashtag.save()
  
blog.related_hashtag.add(oldHashtag)

   except Hashtag.DoesNotExist:
   newHashtag = 
Hashtag.objects.create(hashtag = lowercase_hash_tag)
  
 blog.related_hashtag.add(newHashtag)

I hope so far everything makes sense. I do the same as above whenever a 
Comment object is created.

Now let's imagine the hashtag #acdc is really popular, about 20-40 blogs 
are added to that hashtag about every millisecond.

If the user searches for #acdc, I would like to show the latest 20 blogs 
that have just been added with #acdc, and as the user loads more, I'll show 
the next 20. They are ordered in DESC order of insertion.

I had to resort to RAW SQL since Django couldn't order the queryset by the 
pk of the ManyToManyField, many others suggested using a through table, but 
for legacy reasons, I did it this way:

SELECT * FROM django_blog INNER JOIN django_blog_related_hashtag ON ( 
django_blog.id = django_blog_related_hashtag.blog_id ) INNER JOIN 
django_hashtag ON ( django_blog_related_hashtag.hashtag_id = 
django_hashtag.id ) WHERE ( django_hashtag.hashtag = 'acdc' ) ORDER BY 
django_blog_related_hashtag.id DESC LIMIT 20

If the user wanted to view page two, I'd simply add an OFFSET at the end of 
the query. A simple function like so:

OFFSET = (page_number -1) * 20

This all works great, but for web apps like Instagram or Twitter or this 
Blog app I just made, if the queryset is constantly growing, the user is 
not always in sync with the most recent data. Which is perfectly fine, I 
don't care about that. But what I do care about is that the next page that 
they request must contain the results that are suppose to come after the 
ones they have just received. If I don't account for this, users might see 
blogs that they have already seen from previous pages because of the 
growing queryset. Someone mentioned caching the entire queryset, but 
wouldn't that have a large memory over head?

I'm looking forward to hearing your thoughts and suggestions.

Kind Regards,

Riegie Godwin

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/c9dace38-4cde-4462-98bb-3ff689b14c7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Suggestions for Paginating a Growing Django Queryset

2014-05-30 Thread Babatunde Akinyanmi
Like someone said, you can consider caching the entire queryset. That will
definitely impact memory. Another idea is to play with timestamps in your
query since you don't care if the results are not accurate. Maybe you
shouldn't paginate then you can draw out the whole queryset, render them on
the page and hide some with JavaScript. This could also impact memory at
the server end and loading time on the client.

Personally i don't see anything wrong with users seeing a blog on two pages
because they will know that the reason is because things have changed.
 On 30 May 2014 23:12, "Riegie Godwin"  wrote:

> Hello everyone!
>
>   I've been stuck for about a week trying to figure this out. Been on
> all sorts of blogs and asked questions on stackoverflow in hopes that it
> would lead in the right direction. I would like to know what you think I
> could do.
>
> Let's say I have 4 models, a User model, Blog Model, Hashtag and a Comment
> model. Users can create blogs, users can comment on them and blogs can be
> tagged by the blog owner from the comments or the blog description itself.
> If I we're to best describe this, think of Instagram. You can hashtag your
> photo when your uploading it, and afterwards by commenting on it.
>
> Here are the models:
>
> #models.py
>
> from django.db import models
> from django.contrib.auth.models import User
>
> class Blog(models.Model):
>user = models.ForeignKey(User)
>description = models.CharField(max_length=140, blank=True)
>blog_text = models.CharField(max_length=5000,  blank=True)
>related_hashtags = models.ManyToManyField('Hashtag',
> related_name = 'tagged_post', null=True, blank=True)
>
>
> class Comment(models.Model):
>user = models.ForeignKey(User)
>comment = models.CharField(max_length=140, blank=True)
>
> class Hashtag(models.Model):
>user = models.ForeignKey(User)
>comment = models.CharField(max_length=140, blank=True)
>count = models.IntegerField(default=1)
>
> Pretty simple. A blog can have multiple hashtags and a hashtag can belong
> to multiple blogs. When a user creates a blog, I check the description
> string for any hashtags using the following function below:
>
>  hash_tags = re.findall(r'#(\w+)', blog.description, re.UNICODE)[:30] #
> each blog can have a maximum of 30 hashtags
>  if hash_tags:
> for hash_tag in hash_tags:
>try:
>   oldHashtag =
> Hashtag.objects.get(hashtag = hash_tag)
>   oldHashtag.count =
> oldHashtag.count + 1
>   oldHashtag.save()
>
> blog.related_hashtag.add(oldHashtag)
>
>except Hashtag.DoesNotExist:
>newHashtag =
> Hashtag.objects.create(hashtag = lowercase_hash_tag)
>
>  blog.related_hashtag.add(newHashtag)
>
> I hope so far everything makes sense. I do the same as above whenever a
> Comment object is created.
>
> Now let's imagine the hashtag #acdc is really popular, about 20-40 blogs
> are added to that hashtag about every millisecond.
>
> If the user searches for #acdc, I would like to show the latest 20 blogs
> that have just been added with #acdc, and as the user loads more, I'll show
> the next 20. They are ordered in DESC order of insertion.
>
> I had to resort to RAW SQL since Django couldn't order the queryset by the
> pk of the ManyToManyField, many others suggested using a through table, but
> for legacy reasons, I did it this way:
>
> SELECT * FROM django_blog INNER JOIN django_blog_related_hashtag ON (
> django_blog.id = django_blog_related_hashtag.blog_id ) INNER JOIN
> django_hashtag ON ( django_blog_related_hashtag.hashtag_id =
> django_hashtag.id ) WHERE ( django_hashtag.hashtag = 'acdc' ) ORDER BY
> django_blog_related_hashtag.id DESC LIMIT 20
>
> If the user wanted to view page two, I'd simply add an OFFSET at the end
> of the query. A simple function like so:
>
> OFFSET = (page_number -1) * 20
>
> This all works great, but for web apps like Instagram or Twitter or this
> Blog app I just made, if the queryset is constantly growing, the user is
> not always in sync with the most recent data. Which is perfectly fine, I
> don't care about that. But what I do care about is that the next page that
> they request must contain the results that are suppose to come after the
> ones they have just received. If I don't account for this, users might see
> blogs that they have already seen from previous pages because of the
> growing queryset. Someone mentioned caching the entire queryset, but
> wouldn't that have a large memory over head?
>
> I'm looking forward to hearing your thoughts and suggestions.
>
> Kind Regards,
>
> Riegie Godwin
>
> --
> You received this message because you are subscribed to the Google Groups