Re: is Django OK for parallelized computations?

Valery Wed, 02 Apr 2008 11:49:02 -0700

Hi Karen

sound reasonable. Look, but Malcolm actually means "there should be no
problem with multiple processes, this is the way all web server stuff
works". Contradiction? I guess, the contradiction is actually in the
initialization of connection to DB that every app process does after
being initiated by web server.


by the way, the initiation of a new connection,  isn't very fast
thing... it would be nice to reuse existing ones.

Well, looks, I have to add some rationales on WHY indeed I do need
this. The queries to DB are alternated with actions similar to
"sleep(1)". It is difficult to separate queries and these "sleepy"
actions, but on the high level devide this soup into 10 or 20 parts is
no problem at all from algorithmic point of view. Just Django won't
allow me to query the DB via same Model object from different
processes...

Regards,
Valery

On Apr 2, 7:24 pm, "Karen Tracey" <[EMAIL PROTECTED]> wrote:
> On Wed, Apr 2, 2008 at 5:16 AM, Valery <[EMAIL PROTECTED]> wrote:
>
> > Hi Malcolm,
>
> > many thanks for your reply. Let me give some more details.
>
> > I use parallelized fetching of database objects like this:
>
> > my_dictionaries = pprocess.pmap( fetch_object, ["serchterm1",
> > "searchterm2", "searchterm3"])
>
> > where fetch_object function is not read-only, but smth like this:
>
> > from mytsite.myapp.models import DBModel
> > def fetch_object(search_term):
> >    lst = DBModel.objects.filter(myfield=search_term)
> >    ## now, a bit oversimplified, I have the following:
> >    lst[0].access_counter = lst[0].access_counter + 1
> >    lst[0].save()
> >    return lst[0].__dict__.copy()
>
> > during the execution of pmap call meant above I obtain different
> > sporadic errors, like:
> > "Broken pipe"
> > "no results to fetch"
> > which seems to be the errors that come from pickle's module during
> > object serializations via pipe used in pprocess.
>
> I'm just guessing, but I think what may be happening here is --
>
> 1. During request processing before you issue the pmap call, a connection to
> the database is established.  This may be a TCP/IP connection or something
> else, but it is likely represented by a file descriptor + whatever cursor
> datastructure is maintained by the underlying database backend.
> 2. When you call pmap, this file descriptor is dup'd to the forked
> processes.  The file descriptor is dup'd, but there is still only  one
> underlying connecting to the database, with multiple processing trying to
> use it.
> 3. Bad things happen.  One process reads everything available on the
> connection, thus consuming results another is expecting ("no results to
> fetch"), or the database gets confused by a mixture of requests sent to it
> and closes the connection to an incoherent client ("Broken pipe"), etc.
>
> As Malcolm mentions above, Django makes sure that each request handling
> thread has its own connection to the database, but when you call pmap here
> you're causing multiple process threads to all share the same connection,
> and that just doesn't work.  Maybe there is some way to write your
> fetch_object() routine so that it re-initializes the connection to the
> database so that each instance of fetch_object would have its own
> connection, but offhand I don't know the magic incantation to do that.
>
> Karen
>
> > By the way, maybe Django's DB Model isn't much compatible with
> > serialization à lá pickle module?
>
> > Regards
> > Valery
>
> > On Apr 2, 10:46 am, Malcolm Tredinnick <[EMAIL PROTECTED]>
> > wrote:
> > > On Wed, 2008-04-02 at 01:21 -0700, Valery wrote:
> > > > Hi
>
> > > > did anyone here use Django in parallelized computations?
>
> > > > I use for about a year a great parallelization approach based on
> > > > 'pmap' function from 'pprocess' module:
> > > >http://www.boddie.org.uk/python/pprocess.html
>
> > > > My own code is designed to be strongly side-effect-free, however I am
> > > > experiencing strange problems after switching to Django.
> > > > Perhaps because I have wrong understanding of Django's DB access
> > > > model.
>
> > > > Perhaps Django's class django.db.models.Model should be used with more
> > > > care under parallelized access?
>
> > > > What about lockings there?
>
> > > > What about memory read/write conflicts there?
>
> > > You'll probably need to provide more details for us to be able to help
> > > here. "Experience strange problems" isn't a particularly specific
> > > problem description.
>
> > > What part of the request/response pipeline are you trying to do in
> > > parallel? If you're using separate processes, things should pretty much
> > > just work, I would think. After all, multiple processes running Django
> > > code at once is *precisely* how a web server works when it's handling
> > > Django. It's not like Django can only serve one request per machine at
> > > once. If you're trying to do multi-threaded operations within the same
> > > process within the same request/response handling, all bets are off. For
> > > example, some of the database backends we support cannot handle sharing
> > > cursors between multiple threads, or even sharing connections between
> > > threads. Each thread must therefore have its own connection and we're
> > > careful to do that as part of starting up a new request each time. But
> > > the module you point to (which I've never used) says it uses forking to
> > > create new processes, rather than just creating threads, so that
> > > shouldn't be an issue.
>
> > > There's nothing that would necessarily interfere with anything at the
> > > Python level, since they're in entirely different process spaces. You
> > > will get the standard transaction-based interaction if you're using a
> > > database that has any kind of transaction support (e.g. even with
> > > SQLite, updates in one process won't be visible to another process until
> > > the first process commits the result), but that's just normal
> > > parallel-access database behaviour.
>
> > > Do remember, though, that a fork() duplicates all your open file
> > > descriptors and things like that, so anything could happen if multiple
> > > things are trying to write back to the response path, for example.
> > > Again, it all depends on your code and that's where a better description
> > > of the "strange problems" will no doubt help.
>
> > > My general hunch, though, is that even if there are problems with things
> > > like the duped file descriptors, it's not something we should worry
> > > about. The "shared nothing" style of design is pretty useful for
> > > request-triggered applications and you're actually trying to do "shared
> > > something" work here within a single request path.
>
> > > Regards,
> > > Malcolm
>
> > > --
> > > Experience is something you don't get until just after you need it.
> >http://www.pointy-stick.com/blog/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: is Django OK for parallelized computations?

Reply via email to