On Wed, Feb 9, 2011 at 8:44 AM, Bogdan Yakovenko <algorith...@gmail.com> wrote:
> Dear Developers,
>
> My name is Bogdan Yakovenko and I'm a graduate student in Warsaw University,
> Poland. I have recently completed my internship at Facebook inc and
> currently thinking about writing my master thesis. I realized it would be
> great  to contribute some cool ideas to django project and do some
> improvements from performance side as a part of my thesis.

Hi, and thanks for offering to help out!

> Django seems to be a great framework for creating sites in very fast and
> transparent way, but from my experience it is not so much effective for huge
> sites which extensively use database and memcache.

I don't know that I can completely agree with your assertion -- there
are plenty of very large, high traffic sites that use Django. However,
we're always interested in any ideas that make it easier to build and
maintain these sites, and if these changes yield performance
improvements for everyone else -- all the better.

> I have couple of ideas which I want to implement in django framework and I
> want to get your opinions about this. Perhaps, they are already implemented
> and I just don't know the way to use them or there are alternative ways to
> do the same in the same effective way. I want to get your opinion on that.
>
> Ok, let me describe the problem first.
>
> Let's consider a huge contemporary web site like some kind of social network
> with millions of users. There are many request each second and even more
> data fetching to generate all the responses. As an example, let's assume
> that every time you load someone's profile page you will see a full
> information about the user + his/her profile pic AND 5 of his/her random
> friends along with 5 of his/her random friends who were online for last 5
> minutes  with their names and profile pics . Generating of such page
> requires 4 data fetching
>
> 1. get user full profile
>
> 2. select 5 random ids of his/her friends
>
> 3. select 5 random ids of his/her online friends
>
> 4. get names and user pics for 2 and 3
>
> We can fetch all our data from DB or from memcache. Since memcache is much
> more effective and extremely scalable, we want to get as much data as
> possible from memcache and avoid DB calls, ideally we want to get all data
> from memcache. That is our first goal for efficiency here. The second goal
> is to reduce the number of actual DB/memcache calls, i.e. use as much
> multifetch as possible instead of using sequence of single fetches.
>
> Here is sketch of what I want to add to django to support these goals.
>
> 1. I need a memcache mechanism for models.Model objects, currently only
> page/template fragment can be natively cached.

That's not entirely true -- you can cache anything you want, but
Django only provides builtin helpers to assist with template fragments
and full pages.

> I want
>
> FullProfile.objects.use_memcache().get(pk=123)
>
> to ask memcache first and then if object is not cached there, to fetch the
> object from DB.
>
> Also there will be a way to cache the object for future use and to delete
> one from cache. This way for example I can memcache all profiles which are
> currently in use and don't do any DB query when I need someone's profile.

You might want to take a look at a couple of third-party caching
projects before you go too much further:

http://packages.python.org/johnny-cache/index.html
https://github.com/jbalogh/django-cache-machine

I would also ask why this needs to be part of the ORM like you are
proposing. Yes, caching is important, but it's an additional layer on
top of data access, not a core part of data access itself. As
demonstrated by the third-party projects I've referenced, it's
entirely feasible to add caching as an external layer, without
requiring modification to the ORM itself.

> 2. Sometimes you don't want to load all fields of models.Model objects and
> you want to get only some fields. Let's say you want to load a friends'
> list, you have a FullProfile which contains many data about user but you
> want to load only name and userpic url.  It will make a sense to create a
> ShortProfile class which contains only these two fields and marked as a part
> of FullProfile. That will reduce the amount of data exchange in memcache/DB
> fetch especially then the list is huge.

This exact use case is why Django provides only() and defer().

> 3. As I understand currently all queries are lazy evaluated. That means that
> DB access will be just single fetches as far as the app need some data. What
> I want to get implemented is a way user can prefetch multiple queries in the
> same time.
>
> Let's say you have querySet1, querySet2 and querySet3 - one is for getting
> user full profile, another for getting random friends and the last one for
> getting online friends.
>
> I'd like to have something like
>
> prefetch([querySet1, querySet2, querySet3])
>
> which make all my data loaded from DB (and memcache after 1. is implemented)
> and use only one DB multifetch query.

An interface that would allow multifetch would be interesting.
Multi-write would possibly also fit in this category. For example, if
you insert 4 values into a many-to-many relationship, this is
currently implemented as 4 INSERT statements. Of course, it could
easily be handled as a single INSERT with 4 value pairs -- except that
Django's ORM doesn't currently have that facility.

These two should be relatively simple additions to the existing ORM.

Another interesting, but more complicated opportunity in this area
would be asynchronous database calls [1], which some databases (by
which I mean, Postgres) support.

[1] http://initd.org/psycopg/docs/advanced.html#asynchronous-support

This sort of interface would allow you to push multiple queries onto
the database, continue processing in your view, and only retrieve the
data when it's actually required. This exploits the lazy evaluation of
querysets by using the period while the query is being lazy to
actually issue and compute the query result.

Implementing asynchronous database access isn't a trivial task. But
then, you said this was for your Masters, so non-trivial work should
be what you are looking for :-)

> That's all I want to do. The way of design I've presented here is quite
> preliminary but I'd be appreciated to get any comments or thoughts on this.

My summary: Suggestions (1) and (2) are largely solved problems --
there may be room for improvement, but it's certainly not a blank
slate in those areas. Suggestion (3) is definitely an area where
improvement is possible.

Best of luck with your Masters! If you want to refine these ideas some
more, we're here to help.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to