Re: [GSoC Proposal] Customizable Serialization

Russell Keith-Magee Thu, 24 Mar 2011 05:43:29 -0700

On Thu, Mar 17, 2011 at 3:47 PM, Vivek Narayanan <m...@vivekn.co.cc> wrote:
> Hi,
>
> This is my proposal for the customizable serialization idea:
>
> There are two formats - A formatted Google Docs version that's easy on
> the eyes ( 
> https://docs.google.com/a/vivekn.co.cc/document/pub?id=1GMWW42sY8cLZ2XRtVEDA9BQzmsqnCNULzskDMwqSUXI
> ) and a plain text version that follows.
>
> -------------------------------------------------------------------------------------------------------------------
> GSoC Proposal: Customizable Serialization for Django
>
> =======
> Synopsis
> =======
> Django provides a serialization framework that is very useful for
> loading and saving fixtures, but not very flexible if one wants to
> provide an API for a Django application or use a serialization format
> different from what is defined by Django. Also the current handling of
> foreign keys and many to many relationships is not really useful
> outside the context of fixtures.
>
> I propose a solution to this problem through a class based
> serialization framework, that would allow the user to customize the
> serialization output to a much greater degree and create new output
> formats on the fly. The main features of this framework would be:
>
>   1. It will enable the users to specify the serialization model as a
> class with configurable field options and methods, similar to Django’s
> Models API.
>   2. Specify new output formats and a greater level of  control over
> namespaces, tags and key-value mappings in XML, YAML, JSON.
>   3. Add metadata and unicode conversion methods to model fields
> through class methods.
>   4. Better handling of foreign keys and many-to-many fields  with a
> custom level of nesting.
>   5. A permission system to provide varying levels of data access.
>   6. Backward compatibility to ensure the smooth processing of
> database fixtures.
>
> =================
> Implementation Details
> =================
> ---------------------------------------
> Modes and Configurations
> ---------------------------------------
> I would like to provide building block configurations for XML, YAML
> and JSON which the user can customize, which would be based more or
> less on the existing skeletal structures in core.serialization and
> core.serialization.base. Also there will be a new Serializer
> configuration called TextSerializer that can represent any arbitrary
> format. I will be providing a ‘fixture’ mode to ensure backward
> compatibility and the seamless working of the ``loaddata`` and
> ``dumpdata`` commands.
> Adding metadata to a field
>
> The user can define methods beginning with “meta_” to add metadata
> about each field. And functions starting with “meta2_” can be used to
> add metadata at the model level. Here is an example:
>
> class ExampleSerializer(serializers.Serializer):
>
>        ...
>
>        def meta_foo(self, field):
>
>           '''
>
>           Extract some metadata from field and return it.
>
>           It would be displayed with the attribute ``foo``
>
>           '''
>
> Temporarily all mappings between data will be stored in a dict as
> string to object/dict mappings and would be converted to the desired
> format at the output stage.
>
> In JSON the metadata would be represented inside an object:
>
>        "key": {"foo": "bar", "value": value}
>
> instead of
>
>        "key": value
>
> In XML, two options would be provided, to represent the metadata as
> individual tags or with tag attributes, through a field option in the
> class.
>
> class Serializer(XMLSerializer):
>
>        metadata_display_mode = TAGS # or ATTRIBUTES
>
> The output would be like:
>
> <field>
>
>   <metadata1>..</metadata1>
>
>   ...
>
>   <Value>Value</Value>
>
> </field>
>
> OR
>
> <field name="" md1 = "" ... > Value </field>


What if you need to support both? e.g.,

<field foo="the foo value">
    <bar>the bar value</bra>
</field>

It seems to me that you would be better served providing a way to
annotate each individual metadata value as (and I'm bikeshedding a
name here) 'major' or 'minor'. JSON would render all metadata as
key-values, and XML can make the distinction and render minor metadata
as attributes, and major metadata as tags.

> To select which fields would have which metadata, the arguments should
> be passed in the ``serialize()`` method as:
>
>        data = ExampleSerializer.serialize(queryset, fields =
> ('field1', ('field2',['foo']) )
>
> Each field can be specified in two ways:
>
> 1. As a string:-> no metadata will be added.
>
> 2. As a 2-element tuple, with the first element a string representing
> field name and the second a list of strings representing the metadata
> attributes to be applied on that field.
>
> Instead of manually specifying the attributes for each field, the user
> can add all metadata functions for all the fields using the
> ``use_all_metadata`` parameter in ``serialize()`` and setting it to
> True.
>
> The existing implementation of ``model.name`` and ``model.pk`` can be
> described using “meta2_” functions. These will be provided as
> ``meta2_name`` and ``meta2_pk`` to facilitate loading and dumping of
> fixtures.
>
> ---------------------------------------------------
> Datatypes and Unicode conversion
> ---------------------------------------------------
>
> The user can specify the protected types (the types that will be
> passed “as is” without any conversion) as a field variable.
>
> The unicode conversion functions for each type can be specified as
> methods - “unicode_xxx”, where 'xxx' represents the type name. If no
> method is provided for a type, a default conversion function will be
> used.
>
> class Example(Serializer):
>
>        ...
>
>        protected_types = (int, str, NoneType, bool)
>
>        ...
>
>        def unicode_tuple(self, object):
>
>                   # Do something with the object
>
> -------------------------------------------------
> Output formatting and conversion
> -------------------------------------------------
> The user can specify the format of the output , the grouping of
> fields, tags, namespaces, indentation and much more. Here are some
> examples:
>
> 1. For text based serializers a custom template would be provided:
>
> class Foobar(TextSerializer):
>
>        field_format = "%(key)s :: { %(value)f, %(meta_d1)s, %
> (meta_d2)}"
>
>        ## Simple string template, meta_xxx would be replaced by
> meta_xxx(field) as
>
>        ## I’ve mentioned above.
>
>        #The three parameters below are required for text mode
>
>        field_separator = ";"
>
>        wrap_begin = "[[" # For external wrapping structure
>
>        wrap_end = "]]"
>
>        indent = 4 # indent by 4 spaces, each level. Default is 0.
>
> 2. For markup based serializers, users can provide strings for the tag
> names of fields, field values and models.
>
> class XMLFoo(XMLSerializer):
>
>        mode = "xml"
>
>        indent = 2
>
>        metadata_display_mode = TAGS
>
>        field_tag_name = "object" # Now all fields will be rendered as
> <object>...</object>
>
>        model_tag_name = "model"
>
>        value_tag_name = "value"
>
>         ## if metadata_display_mode is set to ``TAGS``, this sets the
> tag name of the value of the
>         ## model field
>
> 3. A class field ``wrap_fields`` will be provided to wrap all fields
> of a model into a group, as it is done now. If ``wrap_fields`` is set
> as “all_fields” for example. Then all the fields would be serialized
> inside an object called “all_fields”. If ``wrap_fields`` is not set,
> there will be no grouping.
> Related models and nesting
>
> I will modify the current “start_object -> handle_object ->
> end_object” sequence with a single method for handling a model, so
> that related models can be handled easily using recursion. An option
> of ``nesting_depth`` would be provided to the user as a field
> variable. Default value would be 0, as it is currently. Serializing
> only specific fields of related models can be done by using the fields
> argument in the call to serialize. A related model would be
> represented as “Model_name.field_name” instead of just “field_name”.
>
> Instead of the list -  ``_current``, I would be using separate lists
> for each level of nesting.

I think I see where you're going here. However, I'm not sure it
captures the entire problem.

Part of the problem with the existing serializers is that they don't
account for the fact that there's actually two subproblems to
serialization:

 1) How do I output the value of a specific field
 2) What is the gross structure of an object, which is a collection of
fields plus, plus metadata about an object, plus

So, for a single object the JSON serializer currently outputs:

{
    "pk": 1,
    "model": "myapp.mymodel",
    "fields": {
        "foo": "foo value",
        "bar": "bar value"
    }
}

Implicit in this format is a bunch of assumptions:

 * That the primary key should be rendered in a different way to the
rest of the fields
 * That I actually want to include model metadata like the model name
 * That the list of fields is an embedded structure rather than a list
of top-level attributes.
 * That I want to include all the fields on the model
 * That I don't have any non-model or computed metadata that I want to include

When you start dealing with the XML serializer, you have all these
problems and more (because you have the attribute/tag distinction for
each of these decisions, too -- for example, I may want some fields to
be rendered as attributes, and some as tags.

When you start dealing with foreign keys and m2m, you have an
additional set of assumptions --

 * How far should I traverse relations?
 * Do I traverse reverse relations?
 * How do I represent traversed objects? As FK values? As embedded objects?
 * If they're embedded objects, how do I represent *their* traversed values?
 * What happens with circular relations?
 * If I have two foreign keys on the same model, are they both
serialized the same way?

And so on.

There are some promising aspects to your proposal -- for example, the
datatype conversion and field output ideas seem sound (although as
Andrew noted, they may need a little more elaboration with regards to
non-simple datatypes -- especially datetimes and Geo values).

However, I'm not sure you've fully captured the gross serialization
structure problem. This is the real driving reason for introducing a
broader serialization framework -- to give complete flexibility of the
serialization process to the end user.

> ---------------------------------------------------------
> New features in the serialize() function
> ---------------------------------------------------------
> Apart from the changes I’ve proposed for the ``fields`` argument of
> serialize, I would like to add a couple of features:
>
> • An exclude argument, which would be a list of fields to exclude from
> the model, this would also contain the fields to exclude in related
> models.
>
> • An extras argument, which would allow properties and data returned
> by some methods to be serialized.

For me, the goal should be to deprecate these sorts of arguments. The
decision to include (or exclude) a particular field is a feature of
serialization that is intimately tied to the serialization format, not
something that is an external argument.

> -----------------------------------
> Permission Framework
> -----------------------------------

I'm not sure I see the value in this bit -- at least, not as a
baked-in feature of the serialization framework. A serialization
format encompasses "what should I output"; if you've defined a
sufficiently flexible framework, it should be possible to introduce
permission checks without needing to embed them into the base
serialization framework -- they should just be a set of specific
decisions made by a specific serializer.

In fact -- this may be a good test of your proposed API: Could a third
party write a serializer that prohibited serialization of certain
attributes, or modified the serialization of certain attributes, based
on a check of Django's permissions? Personally, I don't see this as a
core requirement, but demonstrating that it is possible in principle
would be a compelling argument for your API.

> -----------------------------------------------------------------
> Representing the existing serialization model
> -----------------------------------------------------------------
> Here is an implementation of the existing serialization format in
> JSON, this would be the ‘fixture’ mode that I’ve mentioned above.

I think these examples demonstrate what I said earlier -- your
proposed framework allows me to customize the name given to a field in
XML, but doesn't allow me to change the parent of that field within
the broader XML structure.

> ===================
> Deliverables and Timeline
> ===================
>
> I would be working for about 40-45 hours each week and I would be
> writing tests, exceptions and error messages along with development.
> This would more or less be my timeline:

Broadly, this timeline looks like a good start. It's certainly
provides enough detail to demonstrate that you've thought about your
project and it's needs and dependencies.

One suggestion -- what isn't clear from this timeline is when we will
start to see concrete evidence of your progress. From a broad project
management perspective, it would be good to see some concrete
deliverables in your timeline -- e.g., at the end of week 2, it will
be possible to serialize a simple object with integer and string
attributes into a configured JSON structure; by week 4, it will be
possible to use the same structure with XML; and so on.

So -- in summary -- this is a promising start. You've clearly given
the problem some serious thought, but some more serious thought is
needed. I look forward to seeing what you can do with the next
iteration.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: [GSoC Proposal] Customizable Serialization

Reply via email to