Hi Peter
On 18 March 2013 16:35, Peter of the Norse wrote:
> You’re over thinking this by a large margin. getTendersForContractor
> should return a QuerySet. That way you can add things like .select_related
> as necessary in the view.
>
> Also, I’m a bit worried about the fact you have a function called
> getTendersForContractor. It really should be Tender.objects.forContractor.
> Or if the function is just a single filter call, it can be eliminated
> entirely.
>
To confirm, this would require a new Manager that implements the
'forContractor' method?
> * Each prefetch_related hits the database once - a lot better than 3k
> hits, but not as minimal as a sql join which joins all of the tables
>
> Actually, no. If you were to try and do the prefetch_related as a single
> SQL join, you would return an insane number of rows. A single
> prefetch_related wouldn't be a problem, but if you were to try and do a
> select with more than one many field, they would multiply. If you have an
> example of a join statement that doesn’t do that, we’d love to see it.
>
You're correct
>
> * Prefetch_related selects related obects via the criteria: 'in (list of
> ids'), which arguably can be slower
>
> What database are you using where this is true?
>
It was the result of a discussion with a colleague. Also slides 67 & 68 at
http://www.slideshare.net/OReillyOSCON/unbreaking-your-django-applicationimply
that large IN's may cause performance problems -
"68: Large INs are a mess. Very expensive for the database to parse.
Very expensive for the database to execute. If there are potentially more
than 10-15 items in the list, rework the IN as a JOIN against whatever the
source of the keys is."
>
> * Seems brittle - if you change the data model, you have to change all of
> the queries in the data retrieval layer. Denomralising the data would
> reduce this, but we'd still have the problem of retrieving more data than
> the client may possibly want
>
> Yes, if you change code in one place, you might have to change it
> someplace else to match it. Most of us just use the ORM as the data
> retrieval layer. Again, I’m worried you’re over thinking this.
>
I hadn't really considered passing around a QuerySet in place of a list of
Tenders. One thing I'm finding is that the cost in retrieving related
objects is making me rethink the API presented to the client. For example,
consider the following API, which I think is quite reasonable:
class Tender:
def getTotalCost(self):
return self.project.getTotalCost()
class Project:
def getTotalCost(self):
cost = 0
for contract in self.contract_set.all():
cost += constract.getTotalCost()
return cost
similarly implementations for Project & Contract
class Contract:
def getTotalCost(self):
cost = 0
for job in self.job_set.all():
cost += job.getTotalCost()
return cost
Having a Project know how much its total cost is, and presenting this to a
client, is in line with OO principles of abstraction and encapsulation.
Consider a page that outputs the total cost of each contract tendered for a
particular contractor. If we re-use the 'getTendersForContractor' function
(that returns a QuerySet), and then call 'getTotalCost' on each Project
that has a tender, then we still have a problem in that we'll get a
multitude of calls to the database.
We might put the onus on the client to call:
qs = getTendersForContractor(contractor)
for tender in list(qs.prefetch_related('project', 'project__contract',
'project__contract')):
print tender, tender.getTotalCost()
Not only is this complicated, but this requires the client know how the
Tender is implemented (the prefetched would become uneccessary if we
changed the implementation to store the total cost on the actual Tender
itself at the cost of denormalised data). Note that you wouldn't have these
problems if the data existed in memory.
These points are making me rethink my data model and API design, and that
the OO paradigm doesn't always fit well with web apps. In particular the
slides at
http://www.slideshare.net/OReillyOSCON/unbreaking-your-django-application:
23: "Django doesn’t have a good primitive for returning an object graph in
one call. - multiple db trips required" Implies that object graphs like
what I've designed is difficult to implement effeciently..?
24: "So, what do to? If you almost always return the parent along with the
related object, consider object inheritance instead. That does do a nice
join operation. Don’t create an insane class graph, though. If you
frequently use the related object without reference to the parent object,
use a foreign key." Not sure what 'object inheritance' is but might be
useful, and concerned that I might be heading towards an 'insane class
graph' (the real data model has 6 or 7 more related objects). Am I heading
down the wrong path of embracing an OO