Re: #7052 - Fixing serialization for contrib.contenttypes and contrib.auth

Russell Keith-Magee Thu, 03 Dec 2009 19:56:53 -0800

On Fri, Dec 4, 2009 at 11:46 AM, mattimust...@gmail.com
<mattimust...@gmail.com> wrote:
>
>
> On Dec 4, 2:33 am, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
>> Hi all,
>>
>> I've been looking at ticket #7052 again. I've got a draft patch up on
>> Trac, and I'd like feedback on the approach.
>>
>> Previously, I've been advocating the approach of embedding queries
>> into the serialization syntax - essentially, interpreting dictionaries
>> in JSON (and equivalent in other formats) as keyword arguments to a
>> Model.objects.get() call.
>>
>> This approach works, but very rapidly gets messy. It's painfully easy
>> to write a fixture that has circular, nested, or otherwise horribly
>> knotted dependencies. There is also the issue of serialization - a
>> query-based syntax is easy to deserialize, but how do you determine
>> which fields should be included in a serialization?
>>
>> So, I've taken a different approach with this new patch. The new
>> approach is much simpler and more explicit than the last. Rather than
>> trying to embed a query into the serialization language, I've taken a
>> step back and looked at the problem a different way.
>>
>> If there is some mechanism that can be used to look up instances of a
>> model, then by definition, it must be a surrogate key of some kind.
>> Completely independent of serialization, it would be nifty to be able
>> to perform lookups based on this surrogate key.
>>
>> So, lets add a convention for these methods. As an example, consider
>> contrib.auth Permissions (models have been slightly simplified for
>> demo purposes).
>>
>> class PermissionManager(models.Manager):
>>     def get_by_surrogate(self, key):
>>         codename, ct_key = key.split('|',1)
>>         return self.get(
>>             codename=codename,
>>             content_type=ContentType.objects.get_by_surrogate(ct_key)
>>         )
>>
>> class Permission(models.Model):
>>     name = models.CharField(max_length=50)
>>     content_type = models.ForeignKey(ContentType)
>>     codename = models.CharField(max_length=100)
>>     objects = PermissionManager()
>>
>>     def surrogate_key(self):
>>         return '%s|%s' % (self.codename, self.content_type.surrogate_key())
>>
>> There are two additions here - a get_by_surrogate() method on the
>> default manager, and a surrogate_key() method on the model itself. If
>> I have an instance of permission, I can call p.surrogate_key(), which
>> will return 'add_user|auth:user'. If I want to get a particular
>> permission, I can call
>> Permission.objects.get_by_surrogate('add_user|auth:user'), and that
>> will resolve to the appropriate permission (or raise an exception if
>> no answer exists.
>>
>> So far, this is independent of serialization. These methods could be
>> useful to end users for looking up permissions or content types.
>>
>> However, they're also really useful for serialization. The serializers
>> can use the existence of these methods as cues for changes in
>> serialization behavior. If a get_by_surrogate() method is found on the
>> manager, the deserializer will use that method to look up objects; if
>> surrogate_key() exists on the model, that will be used to serialize
>> references to the object instead of using primary key values.
>>
>> So, a JSON serialized reference to a permission that previously read:
>>
>> {
>>     "pk": 1,
>>     "model": "auth.User",
>>     "fields": {
>>         ...
>>         permissions = [ 1,3 ]
>>     }
>>
>> }
>>
>> will now read:
>>
>> {
>>     "pk": 1,
>>     "model": "auth.User",
>>     "fields": {
>>         ...
>>         permissions = [ "add_user|auth:user" ,"delete_user|auth:user" ]
>>     }
>>
>> }
>>
>> The patch attached to #7052 implements this scheme, and includes
>> surrogate key definitions for Permission and ContentType, plus tests
>> to validate that this approach works.
>>
>> Two possible objections:
>>
>>  * It's a string-based serialization format. This means the developer
>> will need to write parsing code and a microsyntax to handle composite
>> surrogate keys (e.g., separating permission name from content type
>> with a pipe, and separating app_label from model with a colon).
>>
>>  * You can only define one surrogate key. Some models might have more
>> than one natural serialization.
>>
>> Personally, I'm comfortable with both of these limitations, but I'm
>> interested in hearing other opinions.
>>
>> On a practical note - there is one failing test, which highlights the
>> one flaw in this scheme that I am aware of. There is still a circular
>> dependency problem - an object must be deserialized before it can be
>> referenced using a surrogate key. I have a couple of ideas of how to
>> address this (essentially fixing it at the dumpdata level with model
>> ordering), but I wanted to get community approval for the general idea
>> before I did too much work on fixing the edge cases.
>>
>> So - opinions?
>
> Hi Russ,
>
> I do not like this approach. Using the output from this serializer for
> anything other than loaddata/dumpdata would be annoying. You would
> have no idea which model's fields the separators (pipe and colon) are
> delimiting. E.g. in my REST api the related models are serialized to
> JSON using my Django Full Serializer and I can just access them as
> Javascript objects in my frontend code. Your approach would require
> repeating the parsing code in javascript as well as having to know
> beforehand which model each and every separator corresponded to (!...@#$
> %^&*:;.,)!!!. Might be ok for 3 models in django.contrib but not for
> hundreds of reusable app's models that might start using this feature.
> What about separator clashes?


There are two separate problems here:
 1) "Full" serialization
 2) Fixing #7052.

As I've said before, full serialization is something I'd like to
tackle, but it's almost orthogonal to the #7052 problem. I can't think
of anything in this solution that would prohibit a later attempt to
provide full serialization - 'utilize surrogate keys' is just one
serialization strategy that could be followed (optionally, if need be)
for foreign keys.

I'm willing to accept the concern about separators, though. Luke's
suggestion to use tuples would fix this - instead of a permission
being rendered (in JSON, for example) as:

  permission = "add_user|auth:user"

you would get

  permission = ('add_user', ('auth', 'user'))

This avoids separator clashes, and it parsed by the same rules as the
serialization language.It also means that the implementation for
get_by_surrogate() can be a little more meaningful - e.g, for content
type:

def get_by_surrogate(app_label, model):
    ...

Is this any better in your opinion?

Yours,
Russ Magee %-)

--

You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: #7052 - Fixing serialization for contrib.contenttypes and contrib.auth

Reply via email to