Re: GSoC Proposal: Serialization Enhancements

Russ Tue, 31 Mar 2009 23:10:23 -0700

Questions for the time-pressed:

* Have you ever needed, or can you conceive of ever wanting, to
provide multiple formats (JSON/XML/etc) for the same data? In other
words, is there a use case for easily producing different
serializations of the same data?

* If you could serialize data in whatever structure you wanted, would
you still need to deserialize it at some point, or is this type of use
more unidirectional?

On Mar 31, 7:33 am, Russell Keith-Magee <[email protected]>
wrote:
> On Tue, Mar 31, 2009 at 11:43 AM, Russ Amos <[email protected]> wrote:
>
> > Would writing an appropriate template, while certainly not ideal, provide
> > most of the functionality for the common use case being discussed?
> > [snipped]
>
> Depends on exactly what you mean by 'template'. I would expect that
> the end serialization would still occur using the underlying
> JSON/XML/YAML libraries, so you can't really use a template in the
> sense of a Django HTML template. However, if you're talking about a
> format in which you can express serialization instructions, then I
> could be convinced (but I need to see details).

I _am_ talking about a bona fide Django HTML template, but only for
the purposes of illustration.  My goal was not to use this as a part
of the proposal, but to ask if some of the flexibility provided by
Django's templating system would be useful for the serialization
changes; namely, structural logic (for/if/etc) and (possibly custom)
filters.  Again, using the template system, or inventing a new one for
serialization, is NOT what I'm suggesting.  Looking at the system as
it is, if I needed to create a custom serialization format, top to
bottom, I would write a view and template, and override the mimetype.
The docs feature instructions on producing CSV in this way [1], as an
example.  Obviously, this is not ideal, but there's also something to
be said for the flexibility, even if part of that is reinventing
wheels.  This was more a rambling brainstorm than a useful part of my
proposal...

[1] => 
http://docs.djangoproject.com/en/dev/howto/outputting-csv/#using-the-template-system

>
> > I ask not because I think that's the best solution, but obviously I need a
> > more accurate mental image of the goal, as seen by the core developers.  So
> > long as deserialization is no issue, as is typical in AJAX applications (or
> > anything where an external system is looking at an app's state), providing a
> > 'shortcut' interface to provide structure and some form of pre-processing
> > hooks seems like a good way to go.
>
> To clarify - it's not that deserialization *isn't* an issue, it's that
> deserialization isn't always possible. Django's default serializers
> have sufficient data in them to allow deserialization. That same data
> _could_ be presented in  a different format, and it would certainly be
> nifty if, in those cases, deserialization could be preserved. However,
> I accept that this is a non-trivial goal, so if it turns out to not be
> possible (or only possible under specific circumstances - such as a
> serializer explicitly marked as deserializable), then I won't lose
> much sleep.
>

Your clarification lines up with my intent, if not my expression,
quite well.  My apologies for my arbitrary use of absolutes!

> Some immediate concerns/questions:
>
>  * How do you deal with objects of different type? At present, you can
> pass a disparate list of objects to the serializer. The only
> requirement is that every element in the list is a Django object - it
> doesn't need to be a homogeneous list.

Initial thoughts are "throw an error if the attribute is missing", but
I need time to consider a generic (read: useful) solution.

>  * How does this translate to non-JSON serializers? The transition to
> YAML shouldn't be too hard, but what about XML? How does `structure`
> get interpreted by the XML serializer? How do you differentiate
> between the element name, element attributes, and child nodes that can
> be used in XML serialization?

This is what stands out to me most, now.  I realized after climbing
into bed last night that I didn't even _consider_ XML, having
previously written it off (in my original proposal) as format (and
therefore irrelevant) since the focus was different.  Obviously XML is
very different from JSON, and I am no longer sure that we can allow
completely arbitrary serialization structure (which is the goal) AND
maintain independence between structure and format, which I would like
to do if at all possible.  I'm not sure if there's a realistic use
case for being able to easily use one structure and multiple formats,
however.  Boiling it down to the least common denominator seems
limiting, but allowing complete flexibility could be quite coupling.

The larger question, I suppose, is do we really want to be subclassing
for structure and subclassing for format, or subclassing for structure
and format?  The former provides a certain level of an "I wrote
decoupled code" feeling, but, again I'm can't find a use case for
this.  The latter feels restrictive if this use case ever does
appear.  There's also something to be said for API uniformity...  Can
a useful level of independence be achieved when the end formats are so
different?

[2] => 
http://code.djangoproject.com/browser/django/trunk/django/core/serializers/xml_serializer.py#L37

> > Some "helpers" I think might be useful would be hooks for the various types
> > of fields, including but not limited to relations, to allow things like
> > special text processing or dependency traversal, and providing the current
> > default "structure" in case the user simply wants to do some pre-processing
> > of some form.
>
> I appreciate that this is one of those details that we will need to
> finesse with time, but it would be interesting to hear your
> preliminary thoughts on this - in particular, on how you plan to link
> the string in the 'template' to the helper.

Conversations about format complications notwithstanding, the actual
serialization process I see as iterating through the structure
attribute, converting keys to unicode, and processing the values as
follows (loosely):
- If the value is a list, and the key happens to be a relation field,
loop through everything in the list with each of the objects in the
relation.  There's a bit of a magic feel to this I don't like, so I've
got an alteration to make below [3].
- If the value is a string, follow conventions -- check if it's a
field of the model, check if it's a method of the model, check if it's
in the form "relation__field" (and "relation__relation__field" etc),
check if it's a method of the serializer, and just default to "it must
be just a string" in the end (although, might this be confusing for
debugging?). Evaluate whatever it ends up being until it, too, is a
string.
- Tack on the value to the string produced, thus far, formatting as
appropriate.

[3] =V
class ProductSerializer(serializers.Serializer):
    structure = {
        "name": "name",
        "price": "price",
        "description": "truncate_description"
    }

    def truncate_description(self, product):
        return product.description[:40]

class OrderSerializer(serializers.Serializer):
    structure = {
        "order_id": "pk",
        "products": "products_list",
        "total: "total_price"
    }

    def products_list(self, order):
        products = order.products.all()
        return [serializers.serialize(self._format, product,
serializer=ProductSerializer) for product in products]

I think this is a bit more realistic a use, eliminating the magical
treatment of list elements, but isn't as ridiculously simple to
write.  Now, you have to want it.  Thoughts?

> However, here's my brain dump, such as it is:

I feel I should take a moment to thank you for taking many moments on
critiquing my proposal and providing your insightful brain dumps, so I
shall: thanks!

> My initial thoughts was that the serializers would end up being a lot
> like the Feeds framework - a base class with lots of
> methods/attributes that can be overridden to provide specific
> rendering behaviour. If you tear down the serialization problem, you
> end up with a set of relatively simple questions:

I've regrouped your observations so my observations make sense.

>  * What is the top level structure (e.g.,, the outer [] in JSON, the
> XML header and root tag)?
>
>  * What is the wrapping structure for each element in the list of
> objects (e.g., the {} in JSON, the <object> tag in XML)
>
>  * How is that list of fields presented to the user? (fields:{} in
> JSON, child elements in XML)

The answers to these hinge on how flexible the custom serializers
should be.  If we're okay with insisting on a little bit of basic
format, we can allow the end-user more freedom with the structure.
However, to provide real flexibility in, say, the additional aspects
of XML serialization, I think we might have to force users to pick a
serialization format, perhaps with the mention that changing between
some formats is easier to do (JSON <-> YAML) than between others (JSON
-> XML).

>  * How is each field rendered? (key-value string pairs? <value>
> nodes?) If the field is itself a serializable object (e.g., another
> Django object) how is it serialized?
>
>  * What descriptive attributes exist for each element in the list?
> (pk, model name)
>
>  * How/where are these descriptive attributes rendered? ( dict
> entries? root node attributes? child nodes?)
>
>  * Which fields (including extra fields, model properties, computed
> fields, etc) should be included in the list of fields?
>
>  * Is there any optional metadata for each data field, such as
> datatype? How is that optional metadata interpreted?

I think these are all answered by the structures I've suggested, and
the existing serializers do a decent job of this, already.  If the end-
user wants to include metadata, he/she is welcome to do so.  The same
can be same of extra fields, which fields, and how to format fields.
If tweaking of a field is necessary, wrap the field in a method of the
serializer:

class MySerializer(serializers.Serializer):
    structure = {
        "field_name": "my_method"
    }

    def my_method(self, object):
        return object.field_name + u"!"

> I was also thinking that you aren't necessarily going to be
> subclassing the serializer itself. The answers to these questions are
> really just rendering instructions that can be followed by any
> serializer, once some common ground rules are established. The
> existing serialization engin has a hard-coded set of answers; what we
> need to do is refactor those answers out into a default definition
> that can be subclassed, overridden, or rewritten to suit specific
> needs.

Yes, and on that point I want to again emphasize that I think there's
something to be said for the difference in format and structure.  If
the two can be kept separate, I would like to use a different name for
the base instruction class than "Serializer", which I've been using to
avoid bikeshed discussion on the name.  That said, I do think a
different name for this would be nice.  serializers.Structure?
serializers.Renderer?

> Some of the serialization instructions will be ignored by some
> renderers: for example, a 'child-name=value' attribute may be used to
> describe the fact that <value> tags are required for XML, but be
> ignored by the JSON and YAML serializer. Obviously, an important task
> here is to define what attributes are required, which are optional,
> and how they map onto each serializer.

That's an interesting point, and I can see that being very nice for
keeping the two separate, which I've been harping on all through this.

> Anyway, there's 10c worth of brain dump - make of it what you will. I
> make no claim that these ideas are watertight - I'm willing to listen
> to any reasonable counterideas or objections to what I have proposed.

10c greatly appreciated. :)

I'll stew on the observations made and pull together something more
complete than the disparate, brainstorm-style, question-and-response
mess exemplified here, but answers to standing questions would be
appreciated, if anyone has the time.

Thanks,
Russ
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: GSoC Proposal: Serialization Enhancements

Reply via email to