On Mon, Aug 23, 2010 at 7:32 PM, Andy <selforgani...@gmail.com> wrote: > On Aug 20, 10:04 pm, Russell Keith-Magee <russ...@keith-magee.com> > wrote: > >>Of course, given that you know your sharding scheme, you could use the >>router directly. >> >>Tweet.objects.using(router.db_for_read(Tweet, author=a)).filter(author_id=a) > > Ah Thanks. This is what I need. > > >> You won't get any argument from me. What's missing is a clear >> suggestion on how we can encompass this problem in the general case. >> Suggestions are welcome. > > One suggestion I have is that any arguments in filter() should also be > passed as part of the hints dictionary to the database router. The > arguments in filter() should be enough information to determine which > shard to select in most cases. > > So in the above example: > > Tweet.objects.filter(author_id=a) > > The keyword:value pair {'author_id': a} should be added to the hints > that got passed to the database router. > > This achieves basically the same results as doing > Tweet.objects.using(router.db_for_read(Tweet, > author_id=a)).filter(author_id=a) > But it doesn't require going through the entire codebase and modifying > every single queryset so it's less prone to error and more DRY.
Ok, so how are the following queries processed? Tweet.objects.filter(author_id=a, other=b) Tweet.objects.filter(author_id=a).filter(other=b) Tweet.objects.filter(author_id=a).exclude(other=b) Like I keep saying - we've given this some thought, and it's easy to solve this for the simple case. It's the general case that poses a problem. This also steps around the fact that we don't actually store the contents of filter() clauses once they're applied; they're converted into query-specific representations as soon as they're added to a queryset. > I have another use case where I want to shard by primary pk which is > an auto-increment. I have this model: > > class Auction(models.Model): > seller_id = models.IntegerField() > text = models.TextField() > price = models.DecimalField() > > The PK of Auction is the auto-increment "id" field. > > Say I divide Auction into 3 shards and set up each shard so that the > auto-increment id's don't collide. > > When I first create and save a new auction, it doesn't have an "id", > so I just want to randomly pick a shard to save to: > > def db_for_write(self, model, **hints): > if model.__name__ == "Auction" and 'instance' not in hints: > return random.choice(['shard1','shard2', 'shard3']) > > Would the above work? Well, it will certainly work in the sense that you will write the instance to a random shard. The issue is whether you will be able to reliably retrieve the objects afterwards. That comes down to exactly how your auto-increment scheme allocates primary keys. If you can guarantee that the primary keys are allocated in a programatic way, then it will probably work. > Also how random is random - would I get a uniform distribution of > records among the shards? Depending on your level of mathematical rigor, that's not a simple question. To a simple approximation, yes, you'll get a uniform distribution. However, the patterns and underlying distribution of random number generators is the subject of continued research and improvement Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.