[GSOC 2012] Customizable serialization
Hi, My name is Piotr Grabowski. I'm last year student at the Institute of Computer Science University of Wrocław (Poland). I want to share with you draft of my GSOC proposal. http://pastebin.com/ePRUj5HC PS. Sorry for my poor english :/ -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: [GSOC 2012] Customizable serialization
W dniu 20.03.2012 22:00, Łukasz Rekucki pisze: 1) The Meta.structure things looks like a non-starter to me. It's a DSL inside a DSL. I'm also not sure what it actually does - you already have all those @attribute decorators, why repeat their names in some string? One of my principle was to let user define any possible structure. Ex Django 1. With Meta.structure you can do: def name return Django structure="a[b[c[name__field]]] 2. Without: def a return BFieldSerializer class BFieldSerializer def b ... You see my point? I agree that second solution is more elegant but first is a lot faster. Question is that someone actually want/need to define structure tree like this. Ex2. ... ... 1. structure="model_field1__field model_field2__field special_model_fields{model_field3__field model_field4__field}" or structure="__fields special_model_fields{model_field3__field model_field4__field}" 2. Even if model_field1/2 will be automaticaly in right place what to do with 3/4 ? def special_model_fields return {'model_field_3' : model_field_3, model... } Hmm, I want to prove that structure will be better in this case but come up with above idea :) If Serializers methods can returns base type objects, FieldSerializers, [] and {} we can define anything :) And it's a lot better than structure! Must rewrite my proposal :) 3) Did you thought about splitting the serialization process in two parts (dehydration + presentation)? This is what most REST frameworks do. First you serialize the objects into Python native types, then render it to any format. Yes, in my solution anything at the end of first phaze will be Python base type, BaseFieldSerialize subclass or BaseModelSerializer subclass with resolved Meta.structure. If I remove Meta.structure it will be even simplier. I can resolve Base(Model/Field)Serializer only when i know to what format it will be serialized. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: [GSOC 2012] Customizable serialization
ld): return smart_unicode(obj._meta) #no need of hydrate__value__ class JSONSerializer(ModelSerializer): pk = PKField(attribute=True) model = ModelField(attribute=True) class Meta: aliases = {'__fields__' : 'fields'} relation_serializer = FlatSerializer class XMLSerializer(JSONSerializer): class Meta: aliases = {'__fields__' : 'field'} default_field_serializer = XMLFieldSerializer default_relation_serializer = XMLFlatRelationSerializer XMLFieldSerializer(Field): @attribute def name(self, name, obj): ... @attribute def type(self, name, obj): ... XMLFlatRelationSerializer(Field): @attribute def to ... @attribute def name ... @attribute def rel ... - Shedule - I want to work approximately 20 hours per week. 15 hours writting code and rest for tests and documentation Before start: Discussion on API design, I hope everything should be clear before I start writting code. Week 1-2: Developing base code for Serializer. Week 3-4: Developing first phase of serialization. Week 5: Developing second phase of deserialization. Week 6: Developing second phase of serialization and first of deserialization It's time for mid-term evaluation. I will have working Serializer except nested relations. Week 7-8: Handling nested ForeignKeys and M2M fields. Week 9: Developing old serialization in new api with backward compatibility Week 10: Regression tests, writing documentation Week 11-12: Buffer weeks - About - My name is Piotr Grabowski. I'm last year student at the Institute of Computer Science University of Wrocław (Poland). I've been working with Django for 2 years. Python is my preffered programing language but I have been using also Ruby(&Rails) and JavaScript. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: [GSOC 2012] Customizable serialization
ust have 'fields', 'include', 'exclude'. If fields is None then use the 'default set of fields' + 'include' - 'exclude'. If fields is not None, use that and ignore include/exclude. Attribute is for xml attribute ... * I wouldn't consider special casing for XML serialization in the complex<->native stage. Sure, yeah, make sure there's an XML implementation that can handle the current Django XML serialization structure, but anything more than that and you're likely to end up muddying the API for a special case of data format. * 'relation_reserialize' - Why is that needed? class Photo sender = User person_on_photo = User If p.sender=p.person_on_photo - mayby we want to serialize this two times or mayby we want ony sender : {serialized_sender}, person_on_photo : 10 * 'object_name' - It's not obvious to me if that's necessary or not. Now every serialized object in XML (in root) is What if we want . We use object_name="obj" * "In what field of serialized input is stored model class name" - What about when the class name isn't stored in the serialization data? First problem is what type of object is in serialized input. There are two way to find it. You can pass Model class as argument to serialization.serialize or specify in Meta.model_name what field contains information about type. * "dehydrate__xxx redefining serialization for type xxx." I'm not convinced about that - it's not very pythonic to rely on type hierarchy in preference to duck typing. Suppose our model has 10 DateTimeFields. And we want only to serialize Date. We use dehydrate__datetime to do it. Cheers, Tom Thanks for your reply. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
[GSoC] Customizable Serialization check-in
Hi! I'm Piotr Grabowski, student from University of Wroclaw, Poland In this Google Summer of Code I will deal with problem of customizable serialization in Django. You can find my proposal here https://gist.github.com/2319638 It's obviously not a finished idea, it's need to be simplified for sure. My mentor Russel Keith Magee told me to look at Tom Christie's serialization API. I found it similar to my proposal, there is a lot in common - declarative fields, same approach to various aspect of serialization , but his API is simpler and it feels better. Since Tom already post on group about his project I can refer to it: W dniu 27.04.2012 06:44, Tom Christie pisze: ... Given that Piotr's GSoC proposal has now been accepted, I'm wondering what the right way forward is? I'd like to continue to push forward with this, but I'm also aware that it might be a bit of an issue if there's already an ongoing GSoC project along the same lines? Having taken a good look through the GSoC proposal, it looks good, and there seems to be a fair bit of overlap, so hopefully he'll find what I've done useful, and I'm sure I'll have plenty of comments on his project as it progresses. I'd consider suggesting a collaborative approach, but the rules of the GSoC wouldn't allow that right? -- Like I said above, your work will be very useful for me. I must read GSoC regulations carefully but for sure collaboration with code writing is impossible. I don't know that I could use your existing code base but I think it's also impossible. However sharing ideas and discuss how the API should look and work it will be very desirable. My plan for next few weeks is to meet Django contribution requirements, solve ticket to prove I now the process off doing it, and what's most important have discussion about serialization API. I hope community will be interested in this feature. After weekend I will post my proposal with updates from Tom's API. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
W dniu 27.04.2012 10:36, Anssi Kääriäinen pisze: On Apr 27, 11:14 am, Piotr Grabowski wrote: Hi! I'm Piotr Grabowski, student from University of Wroclaw, Poland In this Google Summer of Code I will deal with problem of customizable serialization in Django. You can find my proposal here https://gist.github.com/2319638 I quickly skimmed the proposal and I noticed speed/performance wasn't mentioned. I believe performance is important in serialization and especially in deserialization. It is not the number one priority item, but it might be worth it to write a couple of benchmarks (preferably to djangobench [1]) and check that there are no big regressions introduced by your work. If somebody already has good real-life testcases available, please share them... - Anssi [1] https://github.com/jacobian/djangobench/ I didn't think about performance a lot. There will be regressions. Now serialization is very simple: Iterate over fields, transform it into string (or somethink serializable), serialize it with json|yaml|xml. In my approach it is: transform (Model) object to Serializer object, each field from original object is FieldSerializer object, next (maybe recursively) get native python type object from each field, serialize it with json|yaml|xml. I can do some optimalizations in this process but it's clear it will take longer to serialize (and deserialize) object then now. It can be problem with time taken by tests if there is a lot of fixtures. I will try to write good, fast code but I will be very glad if someone give me tips about performance bottlenecks in it. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
W dniu 27.04.2012 12:39, Tom Christie pisze: Hey Piotr, > I quickly skimmed the proposal and I noticed speed/performance wasn't mentioned. I believe performance is important in serialization and especially in deserialization. Right. Also worth considering is making sure the API can deal with streaming large querysets, rather than loading all the data into memory at once. (See also https://code.djangoproject.com/ticket/5423) - Tom. Maybe it can be done with chain of two black box generators. First generator input are queryset (iterable sequence) and user defined Serializer class contains how to transform single object and output is python primitive type objects. Second is feed with this objects and outputs serialized_string. What with nested objects - more generators? Generators are good because we can also reuse Serializer objects == better performance. But like Anssi said - optimize after the code is written, not before :) -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, During this week I have a lot of work so I didn't manage to present my revised proposal in Monday like i said. Sorry. I have it now: https://gist.github.com/2597306 Next week I hope there will be some discussion about my proposal. I will also think how it should be done under the hood. There should be some internal API. I should also resolve one Django ticket. I think about this https://code.djangoproject.com/ticket/9279 There will be good for test cases in my future solution. I should write my proposal on this group? In github I have nice formatting and in this group my Python code was badly formatted. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
The big difference between XML and JSON is that XML allows for values to be packed as attributes. I can see that you've got an 'attribute' argument on a Field, but it isn't clear to me how JSON would interpret this, or how XML would interpret: I consider this a lot. I have two ideas. JSON will drop fields with attribute(True) or JSON will treat it like any other. Second is better in my opinion. - A Field that had multiple sub-Fields, all of which were attribute=True - A Field that had multiple sub-Fields, several of which were attribute=False - The difference between these two definitions by your formatting rules: subval key = KeyField() class KeyField(Field): attr1 = A1Field(attribute=True) attr2 = A2Field(attribute=True) def field_name(self, obj, field_name): return 'subkey' def serialize_field_value(self. obj, field_name): return 'subval' Will work in xml and json. main value class KeyField(Field): attr1 = A1Field(attribute=True) attr2 = A2Field(attribute=True) def serialize_field_value(self. obj, field_name): return 'main_value' Work in xml but fail in json key : { attr1 : 'val1', attr2 : 'val2', ? : 'main_value' } Must raise an exception I don't know if this is acceptable - same Field will work in xml and fail in json. This is not the fault of xml attribute. We can fix that by drop attributes in JSON and ensure that if subfields in field are declared (and attribute=False in at least one of them) then there must be also field_name declared In particular, why is the top level structure of the JSON serializer handled with nested Serializers, but the structure of the XML serializer is handled with nested Fields? I don't understand you. XML serializer is also handled with Serializer: class XMLDumpDataSerializer(YJDumpDataSerializer) YJDumpDataSerialzier is JSON serializer and this is Serializer Yours, Russ Magee %-) -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
None is handled for different types, down to making sure you preserve the correct field ordering across each of json/yaml/xml. I *think* that getting the details of all of those will end up being awkward to express using your current approach. The second approach would be to a dict-like format, that can easily be encoded into json or yaml, but that can also include metadata specific to particular encodings such as xml (or perhaps, say, html). You'd have a generic xml renderer, that handles encoding into fields and attributes in a fairly obvious way, and a dumpdata-specific renderer, that handles the odd edge cases that the dumpdata xml format requires. The dumpdata-specific renderer would use the same intermediate data that's used for json and yaml. I can't agree with that. There are too big differences between existing xml and json serializer output format. There is field 'fields' in json and 'field' in xml. Xml has attributes and json not. It's only presentation and these two cases could be handled in second phase (in renderer). But there is one big difference - xml has additional fields 'to', 'rel', 'type' and these are not presentation. These are informations. The next (and maybe most important) thing to consider is what user should know about formats to be able to serialize his data. In your's approach user should be familiar with for example SimpleXMLGenerator because if he want xml ... ... and json { items : [ ..., ...], } then he must wrote at least one renderer to transform 'items' to 'item' like you did in DumpDataXMLRenderer in django-serializers. I can't accept that. Don't get me wrong, I adopt a lot of your's ideas from django-serializers and I think is very good project. You shouldn't force users to know anything about generating xml or any other format. Maybe you should create some metalanguage for user to speak about what he want like: "I want that field 'items' will be transform to 'item' in xml (but I don't know how to do it)" -> class DumpDataSerializer(ModelSerializer): """ A serializer that is intended to produce dumpdata formatted structures. """ renderer_optons = { 'xml': { 'transform' : {'fields' : 'field'}} , } It's ugly but I hope you understand my idea. I hope all of that makes sense, let me know if I've not explained myself very well anywhere. Regards, Tom -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, This week I think about internal API for Serializer. I want that developers can eventually use it for better customization of their solutions. Next week I must learn for my exams so I suppose I will not do much with serialization project. I will try to resolve some issues about my API that Tom Christie pointed. I know that I didn't do much but at the end of semester I have many tasks related to my studies. After end of May I will have much more time. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, During this week I was focused on my exams. Now I have more time for serialization project. Sadly API isn't finished yet. 21 May in gsoc calendar is time for start coding. Tomorrow I will send updates to API proposal and I will present idea of algorithm (maybe list of steps will be better name) used for serialization. Wednesday 23 May I want start coding and Saturday 27 may I will write next check in and present my initial code. First thing I want to code is basis for serializers.serializer method, Serializer and Field class. After two first weeks I want to be able to serialize very simple objects to json. Like I wrote in my first proposal I'm ready to spend 20 hours per week on this. In two first weeks it will be less due to my studies tasks. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
I do some changes to my previous API: (https://gist.github.com/2597306 <- change are included) * which fields of object are default serialized. It's depend on include_default_field but opposite to Tom Christie solution it's default value is True so all fields (eventually specified in Meta.model_fields) are present . * follow_object attribute. In short - on which object should work Serializer's child Serializer. Tom wrote about this in previous mail but I didn't fully understand the problem so I gave him bad answer. It's better described in algorithm I present. * get rid of aliases and preserve_field_ordering fields * change class hierarchy class Serializer(object) # base class for serializing class Field(Serializer) # class for serializing fields in objects class ObjectSerializer(Serializer) # class for serializing objects class ModelSerializer(Serializer) # class for serializing Django Models. I prepare list of steps for first phase of serialization. It's written in English-Python pseudo code :) Hope indentation will be preserved. Serializer.serialize is function that for object will return dict with python native datatypes. (Object|Model)Serializer.serialize(object, field_name (can be None), **options) 1. Get object 1.1. if object is iterable then do this algorithm for all elements and return list of returned values 1.2. if field_name for object is set from upper level we have object Obj: 1.2.1. if Meta.follow_object == True then work on object Obj.field_name 1.2.2. else work on Obj 2. Find all fields Fs that should be serialized 2.1. Get all fields declared in Serializer 2.2. Get all fields from Meta.fields 2.3. If Meta.include_default_fields = True then get all fields where type is valid in Meta.model_fields and not in Meta.exclude 3. Create dictionary A and for F in Fs: 3.1. Find serializer for F 3.1.1. If F is declared in Serializer then serializer is explicit declared 3.1.2. Else get serializer for F type (m2m related etc) 3.2. Save in dictionary A[field_name] = serializer_value 3.2.1. If field has set label then field_name = label 3.2.2. If field has set attribute=True then add this to dictionary A[__attributes__][field_name] = serializer_value 4. Return A Field.serialize(object, field_name (can be None), **options) 1. Get object 1.1. if it is iterable then do this algorithm for all elements 1.2. work on object Obj passed from upper level 2. Find all fields Fs that should be serialized 2.1. Get all fields from declared fields 3. Create dictionary A and for F in Fs: 3.1. Find serializer for F 3.1.1. F is in declared fields so serializer is explicit declared 3.2. Save in dictionary A[field_name] = serializer_value 3.2.1. If field has set label then field_name = label 3.2.2. If field has set attribute=True then add this to dictionary A[__attributes__][field_name] = serializer_value 4. Resolve function serialized_value 4.1. If Fs (and A) is empty: 4.1.1. If function field_name returns None then return serialized_value 4.1.2. Else return {field_name() : serialized_value()} 4.2. Else 4.2.1. If function field_name returns None then raise Exception 4.2.2. Else A.update({field_name() : serialized_value()}) 5. Return A We have dict (list of dicts) from first phase of serialization. Next __attributes__ must be resolve (depends on format and strategy). Deserialization: (it's early idea) SomeSerializer.deserialize(D - python_native_datetype_objects (dict or list of dict), instance=None, field_name=None, class_name=None, **options) 1. Get object instance # Resolving this may be more complicated than I wrote below (e.g. base on D fields - duck typing) 1.1. If instance is not None then use it 1.2. Else try resolve class_name 1.2.1. If class_name is class object instantiate it. 1.2.2. If class_name is string then find string value for this key in D and instantiate it 1.2.3. If class_name is None raise Exception 2. Find all fields in D and find fields in Serializer for deserializing them 2.1. Resolve label attribute for fields 3. Pass instance, data D and field_name to all fields Serializers 4. Return instance I'm aware that there will be lot of small issues but I believe that ideas are good. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, This week I started coding my project. It' available on branch soc2012-serialization on https://github.com/grapo/django. I'm not very familiar with git so I'm not suer that I do it right: * I forked django repo from github * clone it to my computer * create new branch soc2012 * work in this branch * push it to origin When I want to synchronize my branch with django trunk I will fetch master from upstream (django/django) and merge master to my branch. It's this flow good? Until now I coded base for Serializers and Fields. I don't include any test or documentation so it can be hard to try it. I am pretty sure that writing appropriate docstring will be a challenge for me :) I copied some metaclass code from django forms and models. You can instantiate ObjectSerializer and try to serialize some simple python objects with it. It will serializer all fields presented in object.__dict__ and return python native datatype. The code is still in early phase so it's not polished and need for some refactor but if You have some tips for me I will be very grateful. Next week I will fix some issues, code ModelSerializer and write documentation and test for what I done so far. I must also think about renaming some functions so the API will be more convenient. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
W dniu 29.05.2012 02:28, Russell Keith-Magee pisze: Hi Piotr; Apologies for the delay in responding to your updated API. On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski wrote: I do some changes to my previous API: (https://gist.github.com/2597306<- change are included) * which fields of object are default serialized. It's depend on include_default_field but opposite to Tom Christie solution it's default value is True so all fields (eventually specified in Meta.model_fields) are present Field options: ~~ * There's a complication here that doesn't make sense to me. Following your syntax, the following would appear to be legal: class FieldA(Field): def serialize(…): def deserialize(…): class FieldB(Field): to = FieldA() def serialize(…): def deserialize(…): class FieldC(Field): to = FieldB(attribute=True) def serialize(…): def deserialize(…): i.e., if Field allows declaration style definitions, and Field can be *used* in declaration style definitions, then it's possible to define them in a nested fashion -- at which point, it isn't clear to me what is going to be output. It seems to me that "attribute" shouldn't be an option on a field declaration; it should either be something that's encompassed in a similar way to serialise/deserialize (i.e., either additional input/output from the serialise methods, or a parallel pair of methods), or the use of a Field as a declarative definition implies that it is of type attribute, and prevents the use of field types that themselves have attributes. In example that You present I thought about raising an exception when the FieldC is defined. Another option is to define class as being attribute: class FieldB(Field): to = FieldA() def serialize(…): def deserialize(…): class Meta: attribute=True Then raise an exception when FieldB is defined because of 'to' field. Still one of my principle is to have one Serializer for all formats (or at least possibility to serialize Serializer in each format) and attribute is something really problematic. About value returns by Field.serialize (Serializer.serialize in general) - now it is dict with key __attribute__, maybe better will be to return tuple (dict/field_value, attributes_dict) because of issues if there is no field_name and attributes are present. Field methods: ~~~ * serialize_value(), deserialize_value(); this is bike shedding, but is there any reason to not use just "serialize() and deserialize()"? I'm using serialize and deserialize in my code. Serializer.serialize(...) returns native python datatype. It's matter of naming but in my opinion serialize is method that should return serialized Field/ObjectSerializer not only part of result (serialized_value returns only part of data needed for Field serialization) ObjectSerializer methods: * Why does ObjectSerializer have options at all? How can it be "meta" operating on a generic object? Consider -- if you pass in an instance of an object, you'll need to use obj.field_name to access fields; if you pass in a dictionary, you'll need to use obj['field_name']. And if you're given a generic object what's the list of default fields to serialize? Like I said last time, ObjectSerializer should be completely definition based. Look at Django's Form base class - it has no "meta" concept -- it's fully declaration based. Then there's ModelForm, which has a meta class; but the output of the ModelForm could be completely manually generated using a base Form. Ok, I think I get this idea finally. Before I think about class Meta more like options for class where it is. ObjectSerializer now is more like ModelForm than like Form. I have idea how to rewrite it and I will notice You when it will be done. * I mentioned this last time -- why is class_name a meta option, rather than a method on the base class with a default implementation? Having it as an Meta attribute I answered You last time, I should add this to proposal. Probably I don't understand the issue. get_class(self, data): if self._meta.class_name is not None: if isinstance(self._meta.class_name, str): return object_from_string(data['self._meta.class_name']) else: return self._meta.class_name raise Exception('No class for deserialization provided') If someone wants more sophisticated class from data resolving then he can override get_class. When I rewrite ObjectSerializer it will be different than this but my idea is to have class_name as short cut for writing method get_class. * I'm not wild about the way related_serializer seems to work, either. Again, like class_name, it seems like it should be a method, not an option. By making it an option, you're a
Re: Customizable Serialization check-in
Hi, Sorry for being late with weekly update. Due to some issues with Meta and my wrong understanding of metaclasses that Russell pointed I spend time on enhance my knowledge about this. I rewrote also some part of code that I have written week before. This week I will do what I was suppose to do last week - initial tests, documentations. After this week serialization should work with simple objects. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi! This week I managed to write deserialization functions and tests. *Issues with deserialization* Working on deserialization give me a lot thoughts about previous concepts. I rewrite Field class so now Field can't be nested. Field can only have subfields if subfields are attributes: class ContentField(Field): title = Field(attribute=True) # valid content = Field() # invalid -> raise exception in class declaration time def serialized_value(...): ... Of course if ContentField is initialized as attribute and have subfields exception is raised (when ContentField is initialized) I changed python datatype format returned from serializer.serialize method. Previously it was dict with serialized fields (label or field name as key) and special key __attributes__ with dict of attributes. Now it is tuple (native, attributes) where native is dict with serialized fields (or generator of dicts) serializer.deserialize always return object instance After first phase of serialization, python_serialized_object will be serialized by NativeFormat instance. Each format (json, xml, yaml, ...) have one NativeFormat that will translate python_serialized_object to serialized_string. I want to be able to do this: object -> python_serial = object_serializer.serialize(object) -> string_serial = native_format.serialize(python_serial) -> python_deserial = native_format.deserialize(string_serial) -> object2 = object_serializer.deserialize(python_deserial) object2 has same content as object Now I have: object -> python_serial = object_serializer.serialize(object) -> object2 = object_serializer.deserialize(python_deserial) *Tests* I wrote some tests (NativeSerializersTests) for ObjectSerializer in django/tests/modeltests/serializers/tests.py but I'm not sure this is good place for them. I used model (Article) defined in models.py but I used it like normal object. Relation fields aren't serialized in proper way. Until now I tested the most important functions of ObjectSerializer. Creating custom fields, attributes, rename fields (using labels). Next I want to resolve issues with: * Instance creation when deserialize. I have create_instance method and Meta.class_name. I must do some public API from them. * Ensure that Field serialize method returns always simple native python datatypes * Write NativeFormat for (at least) json * Find better names for already defined classes, methods and files * More tests and documentation When I do this serialization and deserialization will be more or less done for (non model) python objects. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi! This week I wrote simple serialization and deserialization for json format so it's possible now to encode objects from and to json: import django.core.serializers as s class Foo(object): def __init__(self): self.bar = [Bar(), Bar(), Bar()] self.x = "X" class Bar(object): def __init__(self): self.six = 6 class MyField2(s.Field): def deserialized_value(self, obj, instance, field_name): pass class MyField(s.Field): x = MyField2(label="my_attribute", attribute=True) def serialized_value(self, obj, field_name): return getattr(obj, field_name, "No field like this") def deserialized_value(self, obj, instance, field_name): pass class BarSerializer(s.ObjectSerializer): class Meta: class_name = Bar class FooSerializer(s.ObjectSerializer): my_field=MyField(label="MYFIELD") bar = BarSerializer() class Meta: class_name = Foo foos = [Foo(), Foo(), Foo()] ser = s.serialize('json', foos, serializer=FooSerializer, indent=4) new_foos = s.deserialize('json', ser, deserializer=FooSerializer) There are cases that I don't like: * deserialized_value function with empty content - what to do with fields that we don't want to deserialize. Should be better way to handle this, * I put list foos but return generator new_foos, also bar in Foo object is generator, not list like in input. Generators are better for performance but if I put list in input I want list in output, not generator. I don't know what to do with this. Next week I will handle rest of issues that I mentioned in my last week check-in and refactor json format (de)serialization - usage of streams and proper parameters handling (like indent, etc.) -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
W dniu 20.06.2012 13:50, Tom Christie pisze: >deserialized_value function with empty content Are you asking about how to be able to differentiate between a field that deserializes to `None`, and a field that doesn't deserialize a value at all? No :) I had this problem before and I managed to resolve it - default deserialized_value don't returns anything. It sets the field value. def deserialized_value(self, obj, instance, field_name): setattr(instance, field_name, obj) It is the way I am doing deserialization - pass instance to subfields, retrieve it from them (should be same instance, but in specific cases eg. immutable instance, I can imagine that another instance of same class is returned) and return it. If I don't declare deserialized_value function then function from base class is taken. It's expected behavior. So how to say "This field shouldn't be deserialized". Now I declare: def deserialized_value(self, obj, instance, field_name): pass For true, I can do anything in this function excepting set some value to instance, but declaring function only to say "do nothing" isn't good solution for me. > I changed python datatype format returned from serializer.serialize method. Now it is tuple (native, attributes) I'm not very keen on either this, or on the way that attributes are represented as fields. To me this looks like taking the particular requirements of serializing to xml, and baking them deep into the API, rather than treating them as a special case, and dealing with them in a more decoupled and extensible way. For example, I'd rather see an optional method `attributes` on the `Field` class that returns a dictionary of attributes. You'd then make sure that when you serialize into the native python datatypes prior to rendering, you also have some way of passing through the original Field instances to the renderer in order to provide any additional metadata that might be required in rendering the basic structure. Wiring up things this way around lets you support other formats that have extra information attached to the basic structure of the data. As an example use-case - In addition to json, yaml and xml, a developer might also want to be able to serialize to say, a tabular HTML output. In order to do this they might need to be able attach template_name or widget information to a field, that'd only be used if rendering to HTML. It might be that it's a bit late in the day for API changes like that, but hopefully it at least makes clear why I think that treating XML attributes as anything other than a special case isn't quite the right thing to do. - Just my personal opinion of course :) Regards, Tom You right that I shouldn't treated attributes so special. I have idea how to fix this. Where I returned (native, attributes) I will return (native, metainfo). It's only matter of renaming but metainfo will be more than attributes. In xml metainfo can contains attributes for field, in html it can be template_name or widget for rendering. If I don't use metainfo in my serializer class then it's still universal - can be used for serialization to any format. How to create metainfo? Have a method `metainfo' in `Field` class that returns a dictionary seems to be good idea. And it is for this use-cases for html. But what to do with xml attributes again? :) They aren't only field meta informations but they can also contains instance information valuable in deserialization (like instance pk in current django solution) so they should be treated as fields, should have access to instance in serialization and deserialization. My last thought is that attributes should be treated as normal fields and be in tuple's native object and in metainfo there will be information for xml which fields in native should be rendered as attributes. After first phase: native =={ 'field_1' : value1, 'field_2' : value2, 'field_3' : value3, } metainfo == { 'as_attributes' : ['field_2', 'field_3'], 'template_name' : 'my_template' } So if we use json in second phase field_2 and field_3 will be render same way as field_1 because json don't read metainfo. Xml will render fields according to metainfo['as_attributes']. Html will render native dict using my_template. -- Piotr Grabowski On Tuesday, 19 June 2012 21:48:37 UTC+1, Piotr Grabowski wrote: Hi! This week I wrote simple serialization and deserialization for json format so it's possible now to encode objects from and to json: import django.core.serializers as s class Foo(object): ��� def __init__(self): ��� self.bar = [Bar(), Bar(), Bar()] ��� self.x = "X" class Bar
Re: Customizable Serialization check-in
W dniu 26.06.2012 11:52, Tom Christie pisze: > It is the way I am doing deserialization - pass instance to subfields Seems fine. It's worth keeping in mind that there's two ways around of doing this. 1. Create an empty instance first, then populate it with the field values in turn. 2. Populate a dictionary with the field values first, and then create an instance using those values. The current deserialization does something closer to the second. I don't know if there's any issues with doing things the other way around, but you'll want to consider which makes more sense. Second approach assume that every field returns some value. But what if we don't want to deserialize some field? In my deserialization instance is passed to field and field will eventually fill it with some value. def deserialize_value(self, obj, instance, field_name): setattr(instance, field_name, obj) If we don't want to deserialize field we simply do nothing in deserialize_value. If second approach is used we must return value. Some idea is to mark field as not deserializable: class MyField(Field): deserializable = False > Where I returned (native, attributes) I will return (native, metainfo). It's only matter of renaming but metainfo will be more than attributes. Again, there's two main ways around I can think of for populating metadata such as xml attributes. 1. Return the metadata upfront to the renderer. 2. Include some way for the renderer to get whatever metadata it needs at the point it's needed. This is one point where what I'm doing in django-serializers differs from your work, in that rather than return extra metadata upfront, the serializers return a dictionary-like object (that e.g. can be directly serialized to json or yaml), that also includes a way of returning the fields for each key (so that e.g. the xml renderer can call field.attributes() when it's rendering each field.) Again, you might decide that (1) makes more sense, but it's worth considering. As ever, if there's any of this you'd like to talk over off-list, feel free to drop me a mail - t...@tomchristie.com Regards, Tom I rewrite this so it's more similar to django-serializers. But from the beginning - what I do in this week? :) I agreed that xml attributes in my solution are overstated. So I want to modify it. Attributes in xml are one of (two) ways of presenting information. I still want to have field for attributes, but doing it in this way: class MyField(Field): attr1 = Field() attr2 = Field() def serialized_value(self, obj, field_name): return field_value def metainfo(self): return {'attributes' : ['attr1', 'attr2']} JSON will skip attributes at all: some_field : field_value XML will render it: field_value If metainfo won't return dict with attributes XML will render this: val1 val2 field_value I code it like django-serializers's DictWithMeta but I added one more functionality to represent Field that have subfields and one extra value. I'm still not convicted it is good solution, so I rewrite it several times but always end up with something like that :) I will push code tomorrow because I still want to do some tweaks. -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, It is time to midterm evaluation of my participation in gsoc so I want to summarize in this check-in what I have done in last month. https://gist.github.com/3085250 - here is something that can be "documentation". I wrote some examples of ModelSerializer usage and how it should work. https://github.com/grapo/django - in branch soc2012-serialization is code that I wrote. There is still problem with API and how to do some things but in my opinion it's going in right direction. Serialization and deserialization of Python objects is almost done. There is quite stable API, i used some ideas (and little code) from https://github.com/tomchristie/django-serializers Objects are serialized to metadicts which are dicts with additional data. this additional data can be used by format serializer to change presentation of data (e.g. attributes in xml) Serialization of Django models is started. I don't know what fields of model should be serialized by default: for sure all declared in model fields. What with pk field, reverse related fields? Json dumpdata serializer is more or less written - I have not done fields sorting yet. I am sure that I can finish all this work until gsoc end. Sadly not all is going well. Especially my communication in this list and with my mentor should be improved. It's all by my fault. I should wrote check-ins more regularly and meet the deadlines that I set. I am not very satisfied with progress I have made. It can be done much more in about one and a half month. Regards, Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
W dniu 11.07.2012 14:04, Russell Keith-Magee pisze: There is still problem with API and how to do some things but in my opinion it's going in right direction. Generally, I agree. I still have some concerns however; mostly around the things that you're putting onto the Meta class. related_serializer, for example -- Why is this a single attribute in the meta, rather than a method? By using an attribute, you're saying that on any given serializer, *all* related objects will be serialised the same, and I don't see why that should be the case. Not *all* related objects but only those that aren't declared in class definition. I think related_serializer attribute is useful when you want to serialize all related object in one way: to their's primary key value, to their's natural key value, to dumpdata format. If you want to do exception for some fields then you declare it in class definition. class MySerializer(ModelSerializer): special_object = SpecialSerializer() class Meta: related_serializer = PkSerializer In this case all related objects except special_object will be serialized to pk value. What you will do more with a related_serializer method? If you want to serialize some related objects by one serializer and some by another the simplest way to do it is declare this in class definition. I see only two examples when method will be needed. If you want to get serializer by some pattern in field name or if you want to get serializer by related object type (m2m, fk). Then you can override get_object_field_serializer(self, obj, field_name) method to do it. Default this method return related_serializer or field_serializer based on field type. Maybe good idea will be to split this method to two, one for related object and one for non related. Then overriding it will be very similar to set attribute in Meta, but I think attributes are more "declarative". The same argument goes for class_name (which I think I've mentioned before), field_serializer, and so on. And there is method for that :) def create_instance(self, serialized_obj): if self.opts.class_name is not None: if isinstance(self.opts.class_name, str): return _get_model(serialized_obj[self.opts.class_name])() else: return self.opts.class_name() raise base.DeserializationError(u"Can't resolve class for object creation") Maybe it isn't proper way to do this - there is two ways to doing same operation, but I think this is simplest solution for end user. The only fields that I can see that *should* be declarative are 'fields' and 'exclude' -- and if you've been tracking django-dev recently, there's been a discussion about whether the idea of 'exclude' should be deprecated from Django APIs (due to potential security issues -- explicit inclusion is safer than implicit inclusion, because you can accidentally forget to exclude sensitive data from an output list) I have read this discussion. I'm +1 to deprecate 'exclude' :) Personally I almost never use it. Some other API questions: Why is deserialized_value decoupled from set_object? It isn't obvious to me why this separation exists. It's possible that I overcomplicated this. There is three methods: set_object, deserialize and deserialize_value. When you want to deserialize object then you should: * Ensure that this is proper object not list of objects or dict (dict in deserialization is another problem - I will present it below) - 'deserialization' method will handle this - it recursively deserialize lists and dicts. * Do some processing on object you get ( e.g. change string to int) 'deserialize_value' method will handle this * Set this object to upper level object. 'set_object' method will handle this. There shouldn't be reason to override it very often. I think deserialize_value will be method that user would most often needed to override. I would be acquiescent to merge deserialize and deserialize_value. But set_object should be left as is. Problem with deserializing dict: In current implementation in deserialization there is no way to guess that given dict is serialized object or it is dict of objects. So it might be better to don't automatically serialize dicts but leave it to the user decision? I see where you're going with metainfo on fields (and that's a reasonably elegant way of tackling the problem of XML needing additional info to serialize), but what is the purpose of metadata on Serializers? Yours, Russ Magee %-) Because Serializer should also have possibility to give additional info to format serializer. For example which fields should be treat as attributes (pk and model in dumpdata). -- Piotr Grabowski -- You received this message because you are subscribed to the Goo
Re: Customizable Serialization check-in
Hi, In the past 3 weeks, my project has changed a lot. First of all I changed output of first phase of serialization. Previously it was python native datatypes. At some point I added dictionary with metadata to it. Metadata was used in second phase of serialization. Now after first phase I returned ObjectWithMetadata which is wrapping for python native datatypes. It's a bit hackish so I don't know it is good solution: class ObjectWithMetadata(object): def __init__(self, obj, metadata=None, fields=None): self._object = obj self.metadata = metadata or {} self.fields = fields or {} def get_object(self): return self._object def __getattribute__(self, attr): if attr not in ['_object', 'metadata', 'fields', 'get_object']: return self._object.__getattribute__(attr) else: return object.__getattribute__(self, attr) # there is a few more methods like this (for acting like a MutableMapping and Iterabla) and all are similar def __getitem__(self, key): return self._object.__getitem__(key) ... Thanks to this solution, ObjectWithMetadata is acting like object stored in _object in almost all cases (also at isinstance tests), and there is place for storing additional data. I didn't change deserialization so in output there are python native datatypes without wrapping. I don't know if this is good because there is no symmetry in this: Django object -> python native datatype packed in ObjectWithMetadata -> json -> python native datatype -> Django object I have all dumpsdata formats working now (xml, json, yaml). All tests pass, but there is problem with order of fields in yaml. It will be fixed soon. I make new format new_xml which is similar to json and yaml. It's easier to parsing it. Old: rel="ManyToOneRel">1 rel="ManyToManyRel"> New: 1 1 2 There is also problem with json and serialization to stream because json is using extensions written in C (_json) for performance and this leads to exceptions when ObjectWithAttributes is used, so before pass objects to json.loads these objects should be unpacked from ObjectWithMetadata. Probably there is no chance to achieve one of most important requirement which I have specify - using only one Serializer to serialize Django Models to multiple formats: serializers.serialize('json', objects, serializer=MySerializer) serializers.serialize('xml', objects, serializer=MySerializer) Trouble is with xml (like always ;). In xml every (model) field must be converted to string before serializing in xml serializer. In json and yaml if field have protected type (string, int, datetime etc.) then nothing is done with it. Converting is done in first phase because only there is access to field.value_to_string - field method that is used to convert field value to string. It can be override by user so simple doing smart_unicode in second phase instead isn't enough. Most important tasks in TODO: handling natural keys tests x correctness x performance (I suspect my solution will be worse than actual used in Django, but how much?) documentation https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Customizable Serialization check-in
Hi, Google Sumer of Code is almost ended. I was working on customizable serialization. This project was a lot harder than I expected, and sadly in my opinion I failed to do it right. I want to apologize for that and especially for my poor communication with this group and my mentor. I want to improve it after midterm evaluation but it was only worse. I don't think my project is all wrong but there is a lot things that are different from how I planned. How it looks like (I wrote more in documentation) There is Serializer class that is made of two classes: NativeSerializer and FormatSerializer. NativeSerializer is for serialization and deserialization python objects from/to native python datatypes FormatSerializer is for serialization and deserialization python native datatypes to/from some format (xml, json, yaml) I want NativeSerializer to be fully independent from FormatSerializer (and vice versa) but this isn't possible. Either NativeSerializer must return some additional data or FormatSerializer must give NativeSerializer some context. For exemple in xml all python native datatypes must be serialized to string before serializing to xml. Some custom model fields can have more sophisticated way to serialize to sting than unicode() so `field.value_to_string` must be called and `field` are only accessible in NativeSerializer object. So either NativeSerializer will return also `field` or FormatSerializer will inform NativeSerializer that it handles only text data. Backward compatible dumpdata is almost working. Only few tests are not passed, but I am not sure why. Nested serialization of fk and m2m related fields which was main functionality of this project is working but not well tested. There are some issues especially with xml. I must write new xml format because old wont work with nested serialization. I didn't do any performance tests. Running full test suite take 40 seconds more with my serialization (about 1500s at all) if I remember correctly. I will try to complete this project so it will be at least bug free and usable. If someone was interested in using nested serialization there is other great project: https://github.com/tomchristie/django-serializers Code: https://github.com/grapo/django/tree/soc2012-serialization Documentation: https://gist.github.com/3085250 -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Moving forward with Serialization.
W dniu 31.08.2012 10:25, Tom Christie pisze: > I personally think that Forms are already the place that should handle (de)serialisation. They already serialise to HTML: why should they not be able to serialise to other stream types? Conceptually I agree. As it happens django-serializers is perfectly capable of rendering into HTML forms, I just haven't yet gotten around to writing a form renderer, since it was out-of-scope of the fixture serialization functionality. Pragmatically, I'm not convinced it'd work very well. The existing Forms implementation is tightly coupled to form-data input and HTML output, and I think trying to address that without breaking backwards compatibility would be rather difficult. It's maybe easy enough to do for flat representations, and pk relationships, but extending it to deal with nested representations, being able to use a Form as a field on another Form, and representing custom relationships would all take some serious hacking. My personal opinion is that whatever benefits you'd gain in DRYness, you'd lose in code complexity. Having said that, if someone was able to hack together a Forms-based fixture serialization/deserialization implementation that passes the Django test suite, and didn't look too kludgy, I'd be perfectly willing to revise my opinion. I am not quite sure but I think Forms should be build based on some serialization API not opposite. Forms are more precise way of models serialization - they are models serialized to html (specific format) with some validation (specific actions) when deserializing. I like Tom's django-serialziers but there are some things that I want to mention: * Process of serialization is split to two parts - transformation to python native datatype (serializer) and next to specific text format (renderer). But when serializing also Field is saved with data so it's not so clean. I also have an issues with this but I resolve it in different way (not say better :) * In master branch Serializer is closely tied to Renderer so if there is different Renderer class than new Serializer is needed. In forms branch it is done in __init__ serialize method and this must be rewrite for backward compatibility if django-serializers goes to core. I want to propose my solution [1]: For each format there is Serializer class which is made from NativeSerializer ( from models to python native datatype) and FormatSerializer (Renderer) class Serializer(object): # class for native python serialization/deserialization SerializerClass = NativeSerializer # class for specific format serialization/deserialization RendererClass = FormatSerializer def serialize(self, queryset, **options): def deserialize(self, stream_or_string, **options): Deserializer = Serializer This is fully backward compatible and user can do: serializers.serialize('registered_format', objects, serializer=MyNativeSerializer) This will make new Serializer class with SerializerClass == MyNativeSerializer. In this solution NativeSerializer and FormatSerializer are more independent. In my solution each NativeSerializer can be render by each FormatSerializer but it's not so simple. FormatSerializer provide NativeSerializer with some context so you can tell that NativeSerializer knows what format will be serialized. It's not exactly format but some metadata about it. I am not proud of this :/ * IMO there is bug related to xml. All model fields must be transform to text before xml serialization. In current django serialization framework field's method value_to_string is responsible for this. In django-serializers this method is not always called so it can lead to error with custom model field [1] https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers -- Piotr Grabowski -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.