Re: [GSoC Proposal] Customizable Serialization

Russell Keith-Magee Wed, 30 Mar 2011 01:08:44 -0700

On Fri, Mar 25, 2011 at 6:03 PM, Vivek Narayanan <[email protected]> wrote:
>> I think I see where you're going here. However, I'm not sure it
>> captures the entire problem.
>>
>> Part of the problem with the existing serializers is that they don't
>> account for the fact that there's actually two subproblems to
>> serialization:
>>
>>  1) How do I output the value of a specific field
>>  2) What is the gross structure of an object, which is a collection of
>> fields plus, plus metadata about an object, plus
>>
>> So, for a single object the JSON serializer currently outputs:
>>
>> {
>>     "pk": 1,
>>     "model": "myapp.mymodel",
>>     "fields": {
>>         "foo": "foo value",
>>         "bar": "bar value"
>>     }
>>
>> }
>>
>> Implicit in this format is a bunch of assumptions:
>>
>>  * That the primary key should be rendered in a different way to the
>> rest of the fields
>>  * That I actually want to include model metadata like the model name
>>  * That the list of fields is an embedded structure rather than a list
>> of top-level attributes.
>>  * That I want to include all the fields on the model
>>  * That I don't have any non-model or computed metadata that I want to 
>> include
>
> I believe that my model of using a recursive method and storing
> temporary data in 'levels' would address most of these concerns. The
> method for handling a model would consist of the following steps,
> roughly:
>
>   * Get the list of fields to be serialized.


For a suitably relaxed definition of "field". Remember, serialized
data doesn't necessarily have to come from the model -- it could come
from a related model, or be a constant, or be a computed field, or
many other options.

>> When you start dealing with foreign keys and m2m, you have an
>> additional set of assumptions --
>>
>>  * How far should I traverse relations?
>
> The user can specify a limit to the levels of nesting through
> variable ``max_nesting_depth``.

A simple "nesting depth" approach won't work. You really need to
handle this on a per-model basis; Mode

It might be possible to automate some of this with a simple nesting
depth definition, but there will always be a need to define the exact
rollout of a tree of serialization options.

This is also a case where being explicit makes your life easier. If
you stop looking at "depth" as a single number specified at the top of
the tree, it becomes a lot easier to handle recursive or

>>  * Do I traverse reverse relations?
>
> In my opinion, traversing reverse relations can get really ugly at
> times, especially when there are M2M fields, foreign keys or circular
> relations involved. But there are some scenarios where the data is in
> a relatively simpler format and serializing them would be useful. To
> support this, I thought of something like this:
>
> class Srz(Serializer):
>   ...
>   reverse_relations = [ (from_model_type, to_model_type), ... ]
>
> But this should be used with caution and avoided when possible.
>
>>  * How do I represent traversed objects? As FK values? As embedded objects?
>
> As embedded objects, if the nesting depth limit is reached, then as FK
> values.

My point is that this is a serialization option. You're dictating a
policy here, rather than allowing it to be a configuration option.

>>  * If they're embedded objects, how do I represent *their* traversed values?
>
> Their traversed values would be represented just as a normal model
> would be, with field-value mappings. The user can choose which fields
> to dump.

Again -- you're dictating a policy, not allowing the user to define one.

>>  * What happens with circular relations?
>
> For all model type objects, like the base model in the query set and
> all FK and M2M fields, some uniquely identifying data (like the
> primary key, content type) will be stored in a list as each one of
> them is processed. Before serializing a model, it would be checked if
> the model is already on the list or not. If it is there, it is a
> circular reference and that model would be ignored .
>
>>  * If I have two foreign keys on the same model, are they both
>> serialized the same way?
>
> Yes.

Why should this be the case? Again, you are dictating policy, not
allowing policy to be defined.

>> When you start dealing with the XML serializer, you have all these
>> problems and more (because you have the attribute/tag distinction for
>> each of these decisions, too -- for example, I may want some fields to
>> be rendered as attributes, and some as tags.
>>
>
> For XML, I thought of using an intermediary container for a node that
> would store all these details.
>
>> > ---------------------------------------------------------
>> > New features in the serialize() function
>> > ---------------------------------------------------------
>> > Apart from the changes I’ve proposed for the ``fields`` argument of
>> > serialize, I would like to add a couple of features:
>>
>> > • An exclude argument, which would be a list of fields to exclude from
>> > the model, this would also contain the fields to exclude in related
>> > models.
>>
>> > • An extras argument, which would allow properties and data returned
>> > by some methods to be serialized.
>>
>> For me, the goal should be to deprecate these sorts of arguments. The
>> decision to include (or exclude) a particular field is a feature of
>> serialization that is intimately tied to the serialization format, not
>> something that is an external argument.
>>
> Initially, I thought the goal was not to tie down a serializer to any
> model, I can integrate these features into the serializer class then.

My point is that it should be *possible* to define a "generic"
serialization strategy -- after all, that's what Django does right
now. If arguments like this do exist, they should essentially be
arguments used to instantiate a specific serialization strategy,
rather than something baked into the serialization API.

>> > -----------------------------------------------------------------
>> > Representing the existing serialization model
>> > -----------------------------------------------------------------
>> > Here is an implementation of the existing serialization format in
>> > JSON, this would be the ‘fixture’ mode that I’ve mentioned above.
>>
>> I think these examples demonstrate what I said earlier -- your
>> proposed framework allows me to customize the name given to a field in
>> XML, but doesn't allow me to change the parent of that field within
>> the broader XML structure.
>
> I'm not sure that I follow this, It would be great if you could give
> an example. As I mentioned earlier, there will be an option to
> 'flatten' the nested models , provide alternate names to the fields,
> and wrap fields into a group. Initially I had thought of adding this
> to the external ``fields`` argument in ``serialize()``, but I can add
> them to the specification object too.

The recurring theme in my comments is that you are dictating policy,
not allowing users to define policy. We already have the former; we
need the latter.

Heres an example of the problem as I see it. Consider the
serialization of a "book" model.

Option 1: Django's current serializer:

{
   "pk": 1
   "model": "library.book"
   "fields": {
       "authors": [3, 4]
       "editor": 5
       "title": "Django for Dummies"
   }
}

No real surprises here.

Option 2: the format required for a particular book publishing API

{
    "book details": {
        "book title": "Django for dummies"
        "authors": [
             "John Smith",
             "Bob Jones"
        ],
    }
    "author_count": 2,
    "editor": {
        "firstname": "Alice"
        "lastname":  Watson",
        "coworkers": [1, 5],
        "contact_phone": "98761234",
        "company": {
             "name": "Mega publishing corp",
             "founded": 2010
        }
    },
}

Notable features:
 * Authors and editor are both "Person" objects, but
     - We need to serialize editors in detail, including recursive calls
     - Authors are serialized using their combined first+last name
 * "book title" is a rename of the native model field
 * "author_count" isn't on the model at all.
 * "book details" doesn't reflect any aspect of model structure --
it's entirely decoration.

Now - show me how both of these serializers are defined using your
proposed API.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: [GSoC Proposal] Customizable Serialization

Reply via email to