[GSOC 2012] Customizable serialization

2012-03-20 Thread Piotr Grabowski

Hi,

My name is Piotr Grabowski. I'm last year student at the Institute of 
Computer Science University of Wrocław (Poland). I want to share with 
you draft of my GSOC proposal.


http://pastebin.com/ePRUj5HC

PS. Sorry for my poor english :/

--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC 2012] Customizable serialization

2012-03-20 Thread Piotr Grabowski

W dniu 20.03.2012 22:00, Łukasz Rekucki pisze:

1) The Meta.structure things looks like a non-starter to me. It's
a DSL inside a DSL. I'm also not sure what it actually does - you
already have all those @attribute decorators, why repeat their names
in some string?

One of my principle was to let user define any possible structure.
Ex




Django





1. With Meta.structure you can do:
def name
return Django
structure="a[b[c[name__field]]]

2. Without:
def a
return BFieldSerializer
class BFieldSerializer
def b
...

You see my point? I agree that second solution is more elegant but first 
is a lot faster.  Question is that someone actually want/need to define 
structure tree like this.


Ex2.


...
...






1. structure="model_field1__field model_field2__field 
special_model_fields{model_field3__field model_field4__field}"
or structure="__fields special_model_fields{model_field3__field 
model_field4__field}"


2. Even if model_field1/2 will be automaticaly in right place what to do 
with 3/4 ?

def special_model_fields
return {'model_field_3' : model_field_3, model... }

Hmm, I want to prove that structure will be better in this case but come 
up with above idea :)
If Serializers methods can returns base type objects, FieldSerializers, 
[] and {} we can define anything :) And it's a lot better than structure!

Must rewrite my proposal :)




 3) Did you thought about splitting the serialization process in
two parts (dehydration + presentation)? This is what most REST
frameworks do. First you serialize the objects into Python native
types, then render it to any format.


Yes, in my solution anything at the end of first phaze will be Python 
base type, BaseFieldSerialize subclass or BaseModelSerializer subclass 
with resolved Meta.structure. If I remove Meta.structure it will be even 
simplier. I can resolve Base(Model/Field)Serializer only when i know to 
what format it will be serialized.




--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC 2012] Customizable serialization

2012-04-02 Thread Piotr Grabowski
ld):
return smart_unicode(obj._meta)

#no need of hydrate__value__


class JSONSerializer(ModelSerializer):
pk = PKField(attribute=True)
model = ModelField(attribute=True)

class Meta:
aliases = {'__fields__' : 'fields'}
relation_serializer = FlatSerializer


class XMLSerializer(JSONSerializer):
class Meta:
aliases = {'__fields__' : 'field'}
default_field_serializer = XMLFieldSerializer
default_relation_serializer = XMLFlatRelationSerializer


XMLFieldSerializer(Field):

@attribute
def name(self, name, obj):
...

@attribute
def type(self, name, obj):
...


XMLFlatRelationSerializer(Field):

@attribute
def to
...

@attribute
def name
...

@attribute
def rel
...

-
Shedule
-
I want to work approximately 20 hours per week. 15 hours writting code 
and rest for tests and documentation


Before start: Discussion on API design, I hope everything should be 
clear before I start writting code.

Week 1-2: Developing base code for Serializer.
Week 3-4: Developing first phase of serialization.
Week 5: Developing second phase of deserialization.
Week 6: Developing second phase of serialization and first of 
deserialization
It's time for mid-term evaluation. I will have working Serializer except 
nested relations.

Week 7-8: Handling nested ForeignKeys and M2M fields.
Week 9: Developing old serialization in new api with backward compatibility
Week 10: Regression tests, writing documentation
Week 11-12: Buffer weeks


-
About
-
My name is Piotr Grabowski. I'm last year student at the Institute of 
Computer Science University of Wrocław (Poland). I've been working with 
Django for 2 years. Python is my preffered programing language but I 
have been using also Ruby(&Rails) and JavaScript.




--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC 2012] Customizable serialization

2012-04-03 Thread Piotr Grabowski
ust have 'fields', 'include', 'exclude'.  If fields is None then use 
the 'default set of fields' + 'include' - 'exclude'.  If fields is not 
None, use that and ignore include/exclude.

Attribute is for xml attribute ...

* I wouldn't consider special casing for XML serialization in the 
complex<->native stage.  Sure, yeah, make sure there's an XML 
implementation that can handle the current Django XML serialization 
structure, but anything more than that and you're likely to end up 
muddying the API for a special case of data format.



* 'relation_reserialize' - Why is that needed?

class Photo
sender = User
person_on_photo = User

If p.sender=p.person_on_photo - mayby we want to serialize this two 
times or mayby we want ony sender : {serialized_sender}, person_on_photo 
: 10



* 'object_name' - It's not obvious to me if that's necessary or not.
Now every serialized object in XML (in root) is  
What if we want . We use object_name="obj"


* "In what field of serialized input is stored model class name" - 
What about when the class name isn't stored in the serialization data?

First problem is what type of object is in serialized input. There are
two way to find it. You can pass Model class as argument to
serialization.serialize or specify in Meta.model_name what field
contains information about type.


* "dehydrate__xxx redefining serialization for type xxx."  I'm not 
convinced about that - it's not very pythonic to rely on 
type hierarchy in preference to duck typing.
Suppose our model has 10 DateTimeFields. And we want only to serialize 
Date. We use dehydrate__datetime to  do it.





Cheers,

  Tom


Thanks for your reply.

--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



[GSoC] Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

Hi!

I'm Piotr Grabowski, student from University of Wroclaw, Poland
In this Google Summer of Code I will  deal with problem of customizable 
serialization in Django.


You can find my proposal here https://gist.github.com/2319638

It's obviously not a finished idea, it's need to be simplified for sure. 
My mentor Russel Keith Magee told me to look at Tom Christie's 
serialization API. I found it similar to my proposal, there is a lot in 
common - declarative fields, same approach to various aspect of 
serialization , but his API is simpler and it feels better.


Since Tom already post on group about his project I can refer to it:

W dniu 27.04.2012 06:44, Tom Christie pisze:

...

Given that Piotr's GSoC proposal has now been accepted, I'm wondering 
what the
right way forward is?  I'd like to continue to push forward with this, 
but I'm
also aware that it might be a bit of an issue if there's already an 
ongoing

GSoC project along the same lines?

Having taken a good look through the GSoC proposal, it looks good, and 
there

seems to be a fair bit of overlap, so hopefully he'll find what I've done
useful, and I'm sure I'll have plenty of comments on his project as it
progresses.

I'd consider suggesting a collaborative approach, but the rules of the 
GSoC

wouldn't allow that right?

--
Like I said above, your work will be very useful for me. I must read 
GSoC regulations carefully but for sure collaboration with code writing 
is impossible. I don't know that I could use your existing code base but 
I think it's also impossible. However sharing ideas and discuss how the 
API should look and work it will be very desirable.



My plan for next few weeks is to meet Django contribution requirements, 
solve ticket to prove I now the process off doing it, and what's most 
important  have discussion about serialization API. I hope community 
will be interested in this feature.


After weekend I will post my proposal with updates from Tom's API.

--
Piotr Grabowski


--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

W dniu 27.04.2012 10:36, Anssi Kääriäinen pisze:

On Apr 27, 11:14 am, Piotr Grabowski  wrote:

Hi!

I'm Piotr Grabowski, student from University of Wroclaw, Poland
In this Google Summer of Code I will  deal with problem of customizable
serialization in Django.

You can find my proposal here https://gist.github.com/2319638

I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization. It is not the number one priority item,
but it might be worth it to write a couple of benchmarks (preferably
to djangobench [1]) and check that there are no big regressions
introduced by your work. If somebody already has good real-life
testcases available, please share them...

  - Anssi

[1] https://github.com/jacobian/djangobench/


I didn't think about performance a lot. There will be regressions.
Now serialization is very simple: Iterate over fields, transform it into 
string (or somethink serializable), serialize it with json|yaml|xml.
In my approach it is: transform (Model) object to Serializer object, 
each field from original object is  FieldSerializer object, next  (maybe 
recursively) get native python type object from each field, serialize it 
with json|yaml|xml.
I can do some optimalizations in this process but it's clear it will 
take longer to serialize (and deserialize) object then now. It can be 
problem with time taken by tests if there is a lot of fixtures.
I will try to write good, fast code but I will be very glad if someone 
give me tips about performance bottlenecks in it.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

W dniu 27.04.2012 12:39, Tom Christie pisze:

Hey Piotr,


> I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization.

Right.  Also worth considering is making sure the API can deal with 
streaming large querysets,

rather than loading all the data into memory at once.
(See also https://code.djangoproject.com/ticket/5423)

- Tom.

Maybe it can be done with chain of two black box generators. First 
generator input are queryset (iterable sequence) and  user defined 
Serializer class contains how to transform single object and output is 
python primitive type objects. Second is feed with this objects and 
outputs serialized_string. What with nested objects - more generators? 
Generators are good because we can also reuse Serializer objects == 
better performance. But like Anssi said - optimize after the code is 
written, not before :)


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-04 Thread Piotr Grabowski

Hi,

During this week I have a lot of work so I didn't manage to present my 
revised proposal in Monday like i said. Sorry. I have it now:

https://gist.github.com/2597306

Next week I hope there will be some discussion about my proposal. I will 
also think how it should be done under the hood. There should be some 
internal API. I should also resolve one Django ticket. I think about 
this https://code.djangoproject.com/ticket/9279 There will be good for 
test cases in my future solution.


I should write my proposal on this group? In github I have nice 
formatting and in this group my Python code was badly formatted.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-07 Thread Piotr Grabowski
 The
big difference between XML and JSON is that XML allows for values to
be packed as attributes. I can see that you've got an 'attribute'
argument on a Field, but it isn't clear to me how JSON would interpret
this, or how XML would interpret:


I consider this a lot. I have two ideas. JSON will drop fields with 
attribute(True) or JSON will treat it like any other. Second is better 
in my opinion.


   - A Field that had multiple sub-Fields, all of which were attribute=True
   - A Field that had multiple sub-Fields, several of which were attribute=False
   - The difference between these two definitions by your formatting rules:


 subval


key = KeyField()

class KeyField(Field):
attr1 = A1Field(attribute=True)
attr2 = A2Field(attribute=True)

def field_name(self, obj, field_name):
return 'subkey'

def serialize_field_value(self. obj, field_name):
return 'subval'

Will work in xml and json.


main value

class KeyField(Field):
attr1 = A1Field(attribute=True)
attr2 = A2Field(attribute=True)


def serialize_field_value(self. obj, field_name):
return 'main_value'

Work in xml but fail in json

key : {
attr1 : 'val1',
attr2 : 'val2',
? : 'main_value'
}
Must raise an exception
I don't know if this is acceptable - same Field will work in xml and 
fail in json. This is not the fault of xml attribute. We can fix that by 
drop attributes in JSON and ensure that
if subfields in field are declared (and attribute=False in at least one 
of them) then there must be also field_name declared


In particular, why is the top level structure of the JSON serializer
handled with nested Serializers, but the structure of the XML
serializer is handled with nested Fields?

I don't understand you. XML serializer is also handled with Serializer:
class XMLDumpDataSerializer(YJDumpDataSerializer)
YJDumpDataSerialzier is JSON serializer and this is Serializer


Yours, Russ Magee %-) 



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-07 Thread Piotr Grabowski
None is handled for different types, 
down to making sure you preserve the correct field ordering across 
each of json/yaml/xml.  I *think* that getting the details of all of 
those will end up being awkward to express using your current approach.
The second approach would be to a dict-like format, that can easily be 
encoded into json or yaml, but that can also include metadata specific 
to particular encodings such as xml (or perhaps, say, html).  You'd 
have a generic xml renderer, that handles encoding into fields and 
attributes in a fairly obvious way, and a dumpdata-specific renderer, 
that handles the odd edge cases that the dumpdata xml format requires. 
 The dumpdata-specific renderer would use the same intermediate data 
that's used for json and yaml.
I can't agree with that. There are too big differences between existing 
xml and json serializer output format. There is field 'fields' in json 
and 'field' in xml. Xml has attributes and json not. It's only 
presentation and these two cases could be handled in second phase (in 
renderer). But there is one big difference - xml has additional fields 
'to', 'rel', 'type' and these are not presentation. These are informations.


The next (and maybe most important) thing to consider is what user 
should know about formats to be able to serialize his data. In your's 
approach user should be familiar with for example SimpleXMLGenerator 
because if he want


xml

...
...


and json
{
items : [ ..., ...],
}

then he must wrote at least one renderer to transform 'items' to 'item' 
like you did in DumpDataXMLRenderer in django-serializers. I can't 
accept that. Don't get me wrong, I adopt a lot of your's ideas from 
django-serializers and I think is very good project. You shouldn't force 
users to know anything about generating xml or any other format. Maybe 
you should create some metalanguage for user to speak about what he want 
like:


 "I want that field 'items' will be transform to 'item' in xml (but I 
don't know how to do it)"  ->


class DumpDataSerializer(ModelSerializer):
"""
A serializer that is intended to produce dumpdata formatted structures.
"""
renderer_optons = {
'xml': { 'transform' : {'fields' : 'field'}} ,
}

It's ugly but I hope you understand my idea.



I hope all of that makes sense, let me know if I've not explained 
myself very well anywhere.


Regards,

  Tom



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-12 Thread Piotr Grabowski

Hi,

This week I think about internal API for Serializer. I want that 
developers can eventually use it for better customization of their 
solutions.


Next week I must learn for my exams so I suppose I will not do much with 
serialization project. I will try to resolve some issues about my API 
that Tom Christie pointed.


I know that I didn't do much but at the end of semester I have many 
tasks related to my studies. After end of May I will have much more time.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-20 Thread Piotr Grabowski

Hi,

During this week I was focused on my exams. Now I have more time for 
serialization project. Sadly API isn't finished yet. 21 May in gsoc 
calendar is time for start coding. Tomorrow I will send updates to API 
proposal and I will present idea of algorithm (maybe list of steps will 
be better name) used for serialization. Wednesday 23 May I want start 
coding and Saturday 27 may I will write next check in and present my 
initial code.


First thing I want to code is basis for serializers.serializer method, 
Serializer and Field class. After two first weeks I want to be able to 
serialize  very simple objects to json. Like I wrote in my first 
proposal I'm ready to spend 20 hours per week on this. In two first 
weeks it will be less due to my studies tasks.



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-21 Thread Piotr Grabowski
I do some changes to my previous API: (https://gist.github.com/2597306 
<- change are included)


 * which fields of object are default serialized. It's depend on 
include_default_field but opposite to Tom Christie solution it's default 
value is True so all fields (eventually specified in Meta.model_fields) 
are present

.
 * follow_object attribute. In short - on which object should work 
Serializer's child Serializer. Tom wrote about this in previous mail but 
I didn't fully understand the problem so I gave him bad answer. It's 
better described in algorithm I present.


 * get rid of aliases and preserve_field_ordering fields

 * change class hierarchy
class Serializer(object) # base class for serializing
class Field(Serializer) # class for serializing fields in objects
class ObjectSerializer(Serializer) # class for serializing objects
class ModelSerializer(Serializer) # class for serializing Django 
Models.



I prepare  list of steps for first phase of serialization. It's written 
in English-Python pseudo code :) Hope indentation will be preserved.
Serializer.serialize is function that for object will return dict with 
python native datatypes.


(Object|Model)Serializer.serialize(object, field_name (can be None), 
**options)

1. Get object
1.1. if object is iterable then do this algorithm for all elements 
and return list of returned values
1.2. if field_name for object is set from upper level we have 
object Obj:
1.2.1. if Meta.follow_object == True then work on object 
Obj.field_name

1.2.2. else work on Obj

2. Find all fields Fs that should be serialized
   2.1. Get all fields declared in Serializer
   2.2. Get all fields from Meta.fields
   2.3. If Meta.include_default_fields = True then get all fields where 
type is valid in Meta.model_fields and not in Meta.exclude


3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. If F is declared in Serializer then serializer is 
explicit declared

3.1.2. Else get serializer for F type (m2m related etc)
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to 
dictionary A[__attributes__][field_name] = serializer_value


4. Return A


Field.serialize(object, field_name (can be None), **options)
1. Get object
1.1. if it is iterable then do this algorithm for all elements
1.2. work on object Obj passed from upper level

2. Find all fields Fs that should be serialized
   2.1. Get all fields from declared fields

3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. F is in declared fields so serializer is explicit declared
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to 
dictionary A[__attributes__][field_name] = serializer_value


4. Resolve function serialized_value
4.1. If Fs (and A) is empty:
4.1.1. If function field_name returns None then return 
serialized_value

4.1.2. Else return {field_name() : serialized_value()}
4.2. Else
4.2.1. If function field_name returns None then raise Exception
4.2.2. Else  A.update({field_name() : serialized_value()})

5. Return A

We have dict (list of dicts) from first phase of serialization. Next 
__attributes__ must be resolve (depends on format and strategy).



Deserialization: (it's early idea)

SomeSerializer.deserialize(D - python_native_datetype_objects (dict or 
list of dict), instance=None, field_name=None, class_name=None, **options)


1. Get object instance # Resolving this may be more complicated than I 
wrote below (e.g. base on D fields - duck typing)

1.1. If instance is not None then use it
1.2. Else try resolve class_name
1.2.1. If class_name is class object instantiate it.
1.2.2. If class_name is string then find string value for this 
key in D and instantiate it

1.2.3. If class_name is None raise Exception

2. Find all fields in D and find fields in Serializer for deserializing them
2.1. Resolve label attribute for fields

3. Pass instance, data D and field_name to all fields Serializers

4. Return instance


I'm aware that there will be lot of small issues but I believe that 
ideas are good.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-27 Thread Piotr Grabowski

Hi,

This week I started coding my project. It' available on branch 
soc2012-serialization on https://github.com/grapo/django.


I'm not very familiar with git so I'm not suer that I do it right:
* I forked django repo from github
* clone it to my computer
* create new branch soc2012
* work in this branch
* push it to origin

When I want to synchronize my branch with django trunk I will fetch 
master from upstream (django/django) and  merge master to my branch. 
It's this flow good?


Until now I coded base for Serializers and Fields. I don't include any 
test or documentation so it can be hard to try it. I am pretty sure that 
writing  appropriate docstring will be a challenge for me :) I copied 
some metaclass code from django forms and models. You can instantiate 
ObjectSerializer and try to serialize some simple python objects with 
it. It will serializer all fields presented in object.__dict__ and 
return python native datatype. The code is still in early phase so it's 
not polished and need for some refactor but if You have some tips for me 
I will be very grateful.


Next week I will fix some issues,  code ModelSerializer and write 
documentation and test for what I done so far. I must also think about 
renaming some functions so the API will be more convenient.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-30 Thread Piotr Grabowski

W dniu 29.05.2012 02:28, Russell Keith-Magee pisze:

Hi Piotr;

Apologies for the delay in responding to your updated API.

On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski  wrote:

I do some changes to my previous API: (https://gist.github.com/2597306<-
change are included)

  * which fields of object are default serialized. It's depend on
include_default_field but opposite to Tom Christie solution it's default
value is True so all fields (eventually specified in Meta.model_fields) are
present

Field options:
~~

  * There's a complication here that doesn't make sense to me.
Following your syntax, the following would appear to be legal:

class FieldA(Field):
 def serialize(…):
 def deserialize(…):

class FieldB(Field):
 to = FieldA()

 def serialize(…):
 def deserialize(…):

class FieldC(Field):
 to = FieldB(attribute=True)

 def serialize(…):
 def deserialize(…):

i.e., if Field allows declaration style definitions, and Field can be
*used* in declaration style definitions, then it's possible to define
them in a nested fashion -- at which point, it isn't clear to me what
is going to be output.

It seems to me that "attribute" shouldn't be an option on a field
declaration; it should either be something that's encompassed in a
similar way to serialise/deserialize (i.e., either additional
input/output from the serialise methods, or a parallel pair of
methods), or the use of a Field as a declarative definition implies
that it is of type attribute, and prevents the use of field types that
themselves have attributes.

In example that You present I thought about raising an exception when the 
FieldC is defined. Another option is to define class as being attribute:
 
class FieldB(Field):

to = FieldA()

def serialize(…):
def deserialize(…):

class Meta:
attribute=True

Then raise an exception when FieldB is defined because of 'to' field. Still one 
of my principle is to have one Serializer for all formats (or at least 
possibility to serialize Serializer in each format) and attribute is something 
really problematic.

About value returns by Field.serialize (Serializer.serialize in general) - now 
it is dict with key __attribute__, maybe better will be to return tuple 
(dict/field_value, attributes_dict) because of issues if there is no field_name 
and attributes are present.





Field methods:
~~~

  * serialize_value(), deserialize_value(); this is bike shedding, but
is there any reason to not use just "serialize() and deserialize()"?
I'm using serialize and deserialize in my code. 
Serializer.serialize(...) returns native python datatype.  It's matter 
of naming but in my opinion serialize is method that should return 
serialized Field/ObjectSerializer not only part of result 
(serialized_value returns only part of data needed for Field serialization)




ObjectSerializer methods:

  * Why does ObjectSerializer have options at all? How can it be "meta"
operating on a generic object? Consider -- if you pass in an instance
of an object, you'll need to use obj.field_name to access fields; if
you pass in a dictionary, you'll need to use obj['field_name']. And if
you're given a generic object what's the list of default fields to
serialize?

Like I said last time, ObjectSerializer should be completely
definition based. Look at Django's Form base class - it has no "meta"
concept -- it's fully declaration based. Then there's ModelForm, which
has a meta class; but the output of the ModelForm could be completely
manually generated using a base Form.
Ok, I think I get this idea finally. Before I think about class Meta 
more like options for class where it is. ObjectSerializer now is more 
like ModelForm than like Form. I have idea how to rewrite it and I will 
notice You when it will be done.

  * I mentioned this last time -- why is class_name a meta option,
rather than a method on the base class with a default implementation?
Having it as an Meta attribute
I answered You last time, I should add this to proposal. Probably I 
don't understand the issue.


get_class(self, data):
if self._meta.class_name is not None:
if isinstance(self._meta.class_name, str):
return object_from_string(data['self._meta.class_name'])
else:
return self._meta.class_name
raise Exception('No class for deserialization provided')

If someone wants more sophisticated class from data resolving then he 
can override get_class.


When I rewrite ObjectSerializer it will be different than this but my 
idea is to have class_name as short cut for writing method get_class.




  * I'm not wild about the way related_serializer seems to work,
either. Again, like class_name, it seems like it should be a method,
not an option. By making it an option, you're a

Re: Customizable Serialization check-in

2012-06-04 Thread Piotr Grabowski

Hi,

Sorry for being late with weekly update. Due to some issues with Meta 
and my wrong understanding of metaclasses  that Russell pointed I spend 
time on enhance my knowledge about this. I rewrote also some part of 
code that I have written week before.
This week I will do what I was suppose to do last week - initial tests, 
documentations. After this week serialization should work with simple 
objects.



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-11 Thread Piotr Grabowski

Hi!

This week I managed to write deserialization functions and tests.

*Issues with deserialization*
Working on deserialization give me a lot thoughts about previous 
concepts. I rewrite Field class so now Field can't be nested. Field can 
only have subfields if subfields are attributes:

class ContentField(Field):
title = Field(attribute=True) # valid
content = Field() # invalid -> raise exception in class declaration 
time


 def serialized_value(...):
 ...

Of course if ContentField is initialized as attribute and have subfields 
exception is raised (when ContentField is initialized)


I changed python datatype format returned from serializer.serialize 
method. Previously it was dict with serialized fields (label or field 
name as key) and special key __attributes__ with dict of attributes. Now 
it is tuple (native, attributes) where native is dict with serialized 
fields (or generator of dicts)


serializer.deserialize always return object instance

After first phase of serialization, python_serialized_object will be 
serialized by NativeFormat instance. Each format (json, xml, yaml, ...) 
have one NativeFormat that will translate python_serialized_object to 
serialized_string. I want to be able to do this:
object -> python_serial = object_serializer.serialize(object) -> 
string_serial = native_format.serialize(python_serial) -> 
python_deserial = native_format.deserialize(string_serial) -> object2 = 
object_serializer.deserialize(python_deserial)

object2 has same content as object

Now I have:
object -> python_serial = object_serializer.serialize(object) ->  
object2 = object_serializer.deserialize(python_deserial)


*Tests*
I wrote some tests (NativeSerializersTests) for ObjectSerializer in 
django/tests/modeltests/serializers/tests.py but I'm not sure this is 
good place for them. I used model (Article) defined in models.py but I 
used it like normal object. Relation fields aren't serialized in proper way.


Until now I tested the most important functions of ObjectSerializer. 
Creating custom fields, attributes, rename fields (using labels).


Next I want to resolve issues with:

 * Instance creation when deserialize. I have create_instance method
   and Meta.class_name. I must do some public API from them.
 * Ensure that Field serialize method returns always simple native
   python datatypes
 * Write NativeFormat for (at least) json
 * Find better names for already defined classes, methods and files
 * More tests and documentation

When I do this serialization and deserialization will be more or less 
done for (non model) python objects.



--
Piotr Grabowski





--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-19 Thread Piotr Grabowski

Hi!

This week I wrote simple serialization and deserialization for json 
format so it's possible now to encode objects from and to json:



import django.core.serializers as s

class Foo(object):
def __init__(self):
self.bar = [Bar(), Bar(), Bar()]
self.x = "X"

class Bar(object):
def __init__(self):
self.six = 6

class MyField2(s.Field):
def deserialized_value(self, obj, instance,  field_name):
pass

class MyField(s.Field):
x = MyField2(label="my_attribute", attribute=True)

def serialized_value(self, obj, field_name):
return getattr(obj, field_name, "No field like this")

def deserialized_value(self, obj, instance,  field_name):
pass

class BarSerializer(s.ObjectSerializer):
class Meta:
class_name = Bar

class FooSerializer(s.ObjectSerializer):
my_field=MyField(label="MYFIELD")
bar = BarSerializer()
class Meta:
class_name = Foo


foos = [Foo(), Foo(), Foo()]
ser = s.serialize('json', foos, serializer=FooSerializer, indent=4)
new_foos = s.deserialize('json', ser, deserializer=FooSerializer)


There are cases that I don't like:

 * deserialized_value function with empty content - what to do with
   fields that we don't want to deserialize. Should be better way to
   handle this,
 * I put list foos but return generator new_foos, also bar in Foo
   object is generator, not list like in input. Generators are better
   for performance but if I put list in input I want list in output,
   not generator. I don't know what to do with this.


Next week I will handle rest of issues that I mentioned in my last week 
check-in and refactor json format (de)serialization - usage of streams 
and proper parameters handling (like indent, etc.)


--
Piotr Grabowski




--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-20 Thread Piotr Grabowski

W dniu 20.06.2012 13:50, Tom Christie pisze:


>deserialized_value function with empty content

Are you asking about how to be able to differentiate between a field 
that deserializes to `None`, and a field that doesn't deserialize a 
value at all?
No :) I had this problem before and I managed to resolve it - default 
deserialized_value don't returns anything. It sets the field value.

def deserialized_value(self, obj, instance, field_name):
setattr(instance, field_name, obj)

It is the way I am doing deserialization - pass instance to subfields, 
retrieve it from them (should be same instance, but in specific cases 
eg. immutable instance, I can imagine that another instance of same 
class is returned)  and return it.


If I don't declare deserialized_value function then function from base 
class is taken. It's expected behavior. So how to say "This field 
shouldn't be deserialized".  Now I declare:

def deserialized_value(self, obj, instance, field_name):
pass
For true, I can do anything in this function excepting set some value to 
instance, but declaring function only to say "do nothing" isn't good 
solution for me.




> I changed python datatype format returned from serializer.serialize 
method.  Now it is tuple (native, attributes)


I'm not very keen on either this, or on the way that attributes are 
represented as fields.
To me this looks like taking the particular requirements of 
serializing to xml, and baking them deep into the API, rather than 
treating them as a special case, and dealing with them in a more 
decoupled and extensible way.


For example, I'd rather see an optional method `attributes` on the 
`Field` class that returns a dictionary of attributes.  You'd then 
make sure that when you serialize into the native python datatypes 
prior to rendering, you also have some way of passing through the 
original Field instances to the renderer in order to provide any 
additional metadata that might be required in rendering the basic 
structure.


Wiring up things this way around lets you support other formats that 
have extra information attached to the basic structure of the data. 
 As an example use-case - In addition to json, yaml and xml, a 
developer might also want to be able to serialize to say, a tabular 
HTML output.  In order to do this they might need to be able attach 
template_name or widget information to a field, that'd only be used if 
rendering to HTML.


It might be that it's a bit late in the day for API changes like that, 
but hopefully it at least makes clear why I think that treating XML 
attributes as anything other than a special case isn't quite the right 
thing to do.  - Just my personal opinion of course :)


Regards,

  Tom



You right that I shouldn't treated attributes so special. I have idea 
how to fix this. Where I returned (native, attributes) I will return 
(native, metainfo). It's only matter of renaming but metainfo will be 
more than attributes. In xml metainfo can contains attributes for field, 
in html it can be template_name or widget for rendering. If I don't use 
metainfo in my serializer class then it's still universal - can be used 
for serialization to any format.


How to create metainfo? Have a method `metainfo' in `Field` class that 
returns a dictionary seems to be good idea. And it is for this use-cases 
for html. But what to do with xml attributes again? :) They aren't only 
field meta informations but they can also contains instance information 
valuable in deserialization (like instance pk in current django 
solution) so they should be treated as fields, should have access to 
instance in serialization and deserialization.


 My last thought is that attributes should be treated as normal fields 
and be in tuple's native object and in metainfo there will be 
information for xml which fields in native should be rendered as attributes.

After first phase:
native =={
'field_1' : value1,
'field_2' : value2,
'field_3' : value3,
}
metainfo == {
'as_attributes' : ['field_2', 'field_3'],
'template_name' : 'my_template'
}

So if we use json in second phase field_2 and field_3 will be render 
same way as field_1 because json don't read metainfo. Xml will render 
fields according to metainfo['as_attributes']. Html will render native 
dict using my_template.


--
Piotr Grabowski


On Tuesday, 19 June 2012 21:48:37 UTC+1, Piotr Grabowski wrote:

Hi!

This week I wrote simple serialization and deserialization for
json format so it's possible now to encode objects from and to json:


import django.core.serializers as s

class Foo(object):
��� def __init__(self):
��� self.bar = [Bar(), Bar(), Bar()]
��� self.x = "X"

class Bar

Re: Customizable Serialization check-in

2012-06-28 Thread Piotr Grabowski

W dniu 26.06.2012 11:52, Tom Christie pisze:

> It is the way I am doing deserialization - pass instance to subfields

Seems fine.  It's worth keeping in mind that there's two ways around 
of doing this.


1. Create an empty instance first, then populate it with the field 
values in turn.
2. Populate a dictionary with the field values first, and then create 
an instance using those values.


The current deserialization does something closer to the second.
I don't know if there's any issues with doing things the other way 
around, but you'll want to consider which makes more sense.


Second approach assume that every field returns some value. But what if 
we don't want to deserialize some field? In my deserialization instance 
is passed to field and field will eventually fill it with some value.

def deserialize_value(self, obj, instance, field_name):
setattr(instance, field_name, obj)

If we don't want to deserialize field we simply do nothing in 
deserialize_value.
If second approach is used we must return value. Some idea is to mark 
field as not deserializable:

class MyField(Field):
deserializable = False


> Where I returned (native, attributes) I will return (native, 
metainfo). It's only matter of renaming but metainfo will be more than 
attributes.


Again, there's two main ways around I can think of for populating 
metadata such as xml attributes.


1. Return the metadata upfront to the renderer.
2. Include some way for the renderer to get whatever metadata it needs 
at the point it's needed.


This is one point where what I'm doing in django-serializers differs 
from your work, in that rather than return extra metadata upfront, the 
serializers return a dictionary-like object (that e.g. can be directly 
serialized to json or yaml), that also includes a way of returning the 
fields for each key (so that e.g. the xml renderer can call 
field.attributes() when it's rendering each field.)


Again, you might decide that (1) makes more sense, but it's worth 
considering.


As ever, if there's any of this you'd like to talk over off-list, feel 
free to drop me a mail - t...@tomchristie.com


Regards,

  Tom


I rewrite this so it's more similar to django-serializers.
But from the beginning - what I do in this week? :)
I agreed that xml attributes in my solution are  overstated. So I want 
to modify it. Attributes in xml are one of (two) ways of presenting 
information. I still want to have field for attributes, but doing it in 
this way:


class MyField(Field):
attr1 = Field()
attr2 = Field()

def serialized_value(self, obj, field_name):
return field_value

def metainfo(self):
return {'attributes' : ['attr1', 'attr2']}


JSON will skip attributes at all:
some_field : field_value

XML will render it:

 field_value


If metainfo won't return dict with attributes XML will render this:

val1
val2
field_value


I code it like django-serializers's DictWithMeta but I added one more 
functionality to represent Field that have subfields and one extra 
value. I'm still not convicted it is good solution, so I rewrite it 
several times but always end up with something like that :)

 I will push code tomorrow because I still want to do some tweaks.

--
Piotr Grabowski






--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-07-10 Thread Piotr Grabowski

Hi,

It is time to midterm evaluation of my participation in gsoc so I want 
to summarize in this check-in what I have done in last month.
https://gist.github.com/3085250 - here is something that can be 
"documentation". I wrote some examples of ModelSerializer usage and how 
it should work.
https://github.com/grapo/django - in branch soc2012-serialization is 
code that I wrote.


There is still problem with API and how to do some things but in my 
opinion it's going in right direction.


Serialization and deserialization of Python objects is almost done. 
There is quite stable API, i used some ideas (and little code) from 
https://github.com/tomchristie/django-serializers
Objects are serialized to metadicts which are dicts with additional 
data. this additional data can be used by format serializer to change 
presentation of data (e.g. attributes in xml)


Serialization of Django models is started. I don't know what fields of 
model should be serialized by default: for sure all declared in model 
fields. What with pk field, reverse related fields?


Json dumpdata serializer is more or less written - I have not done 
fields sorting yet.


I am sure that I can finish all this work until gsoc end.

Sadly not all is going well. Especially my communication in this list 
and with my mentor should be improved. It's all by my fault. I should 
wrote check-ins more regularly and meet the deadlines that I set. I am 
not very satisfied with progress I have made. It can be done much more 
in about one and a half month.


Regards,
Piotr Grabowski





--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-07-12 Thread Piotr Grabowski

W dniu 11.07.2012 14:04, Russell Keith-Magee pisze:

There is still problem with API and how to do some things but in my opinion
it's going in right direction.

Generally, I agree. I still have some concerns however; mostly around
the things that you're putting onto the Meta class.

related_serializer, for example -- Why is this a single attribute in
the meta, rather than a method? By using an attribute, you're saying
that on any given serializer, *all* related objects will be serialised
the same, and I don't see why that should be the case.
Not *all* related objects but only those that aren't declared in class 
definition. I think related_serializer attribute is useful when you want 
to serialize all related object in one way: to their's primary key 
value, to their's  natural key value, to dumpdata format. If you want to 
do exception for some fields then you declare it in class definition.



class MySerializer(ModelSerializer):
special_object =  SpecialSerializer()
class Meta:
related_serializer = PkSerializer

In this case all related objects except special_object will be 
serialized to pk value.


What you will do more with a related_serializer method? If you want to 
serialize some related objects by one serializer and some by another the 
simplest way to do it is declare this in class definition.
I see only two examples when method will be needed. If you want to get 
serializer by some pattern in field name or if you want to get 
serializer by related object type (m2m, fk). Then you can override 
get_object_field_serializer(self, obj, field_name) method to do it. 
Default this method return related_serializer or field_serializer based 
on field type. Maybe good idea will be to split this method to two, one 
for related object and one for non related. Then overriding it will be 
very similar to set attribute in Meta, but I think attributes are more 
"declarative".


The same argument goes for class_name (which I think I've mentioned
before), field_serializer, and so on.

And there is method for that :)

def create_instance(self, serialized_obj):
if self.opts.class_name is not None:
if isinstance(self.opts.class_name, str):
return _get_model(serialized_obj[self.opts.class_name])()
else:
return self.opts.class_name()
raise base.DeserializationError(u"Can't resolve class for object 
creation")


Maybe it isn't proper way to do this - there is two ways to doing same 
operation, but I think this is simplest solution for end user.



The only fields that I can see
that *should* be declarative are 'fields' and 'exclude' -- and if
you've been tracking django-dev recently, there's been a discussion
about whether the idea of 'exclude' should be deprecated from Django
APIs (due to potential security issues -- explicit inclusion is safer
than implicit inclusion, because you can accidentally forget to
exclude sensitive data from an output list)
I have read this discussion. I'm +1 to deprecate 'exclude' :) Personally 
I almost never use it.




Some other API questions:

Why is deserialized_value decoupled from set_object? It isn't obvious
to me why this separation exists.
 It's possible that I overcomplicated this. There is three methods: 
set_object, deserialize and deserialize_value. When you want to 
deserialize object then you should:
* Ensure that this is proper object not list of objects or dict (dict in 
deserialization is another problem - I will present it below) - 
'deserialization' method will handle this - it recursively deserialize 
lists and dicts.
* Do some processing on object you get ( e.g. change string to int) 
'deserialize_value' method will handle this
* Set this object to upper level object. 'set_object' method will handle 
this. There shouldn't be reason to override it very often.


I think deserialize_value will be method that user would most often 
needed to override.
I would be acquiescent to merge deserialize and deserialize_value. But 
set_object should be left as is.


Problem with deserializing dict:
In current implementation in deserialization there is no way to guess 
that given dict is serialized object or it is dict of objects. So it 
might be better to don't automatically serialize dicts but leave it to 
the user decision?


  
I see where you're going with metainfo on fields (and that's a

reasonably elegant way of tackling the problem of XML needing
additional info to serialize), but what is the purpose of metadata on
Serializers?

Yours, Russ Magee %-) 


Because Serializer should also have possibility to give additional info 
to format serializer. For example which fields should be treat as 
attributes (pk and model in dumpdata).



--
Piotr Grabowski

--
You received this message because you are subscribed to the Goo

Re: Customizable Serialization check-in

2012-08-06 Thread Piotr Grabowski

Hi,

In the past 3 weeks, my project has changed a lot. First of all I 
changed output of first phase of serialization. Previously it was python 
native datatypes. At some point I added dictionary with metadata to it. 
Metadata was used in second phase of serialization. Now after first 
phase I returned ObjectWithMetadata which is wrapping for python native 
datatypes. It's a bit hackish so I don't know it is good solution:


class ObjectWithMetadata(object):
def __init__(self, obj, metadata=None, fields=None):
self._object = obj
self.metadata = metadata or {}
self.fields = fields or {}

def get_object(self):
return self._object

def __getattribute__(self, attr):
if attr not in ['_object', 'metadata', 'fields', 'get_object']:
return self._object.__getattribute__(attr)
else:
return object.__getattribute__(self, attr)

# there is a few more methods like this (for acting like a 
MutableMapping and Iterabla) and all are similar

def __getitem__(self, key):
return self._object.__getitem__(key)

...

Thanks to this solution, ObjectWithMetadata is acting like object stored 
in _object in almost all cases (also at isinstance tests), and there is 
place for storing additional data.


I didn't change deserialization so in output there are python native 
datatypes without wrapping. I don't know if this is good because there 
is no symmetry in this:
Django object -> python native datatype packed in ObjectWithMetadata -> 
json -> python native datatype -> Django object



I have all dumpsdata formats working now (xml, json, yaml). All tests 
pass, but there is problem with order of fields in yaml. It will be 
fixed soon.
I make new format new_xml which is similar to json and yaml. It's easier 
to parsing it.


Old:

 rel="ManyToOneRel">1
rel="ManyToManyRel">






New:

 
  1
   
   1
   2
   



There is also problem with json and serialization to stream because json 
is using extensions written in C (_json) for performance and this leads 
to exceptions when ObjectWithAttributes is used, so before pass objects 
to json.loads these objects should be unpacked from ObjectWithMetadata.



Probably there is no chance to achieve one of most important requirement 
which I have specify - using only one Serializer to serialize Django 
Models to multiple formats:

serializers.serialize('json', objects, serializer=MySerializer)
serializers.serialize('xml', objects, serializer=MySerializer)

Trouble is with xml (like always ;).  In xml every (model) field must be 
converted to string before serializing in xml serializer. In json and 
yaml if field have protected type (string, int, datetime etc.) then 
nothing is done with it. Converting is done in first phase because only 
there is access to field.value_to_string - field method that is used to 
convert field value to string. It can be override by user so simple 
doing smart_unicode in second phase instead isn't enough.



Most important tasks in TODO:
handling natural keys
tests
x correctness
x performance (I suspect my solution will be worse than actual used 
in Django, but how much?)

documentation

https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers
--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-08-22 Thread Piotr Grabowski

Hi,

Google Sumer of Code is almost ended. I was working on customizable 
serialization. This project was a lot harder than I expected, and sadly 
in my opinion I failed to do it right. I want to apologize for that and 
especially for my poor communication with this group and my mentor. I 
want to improve it after midterm evaluation but it was only worse.


I don't think my project is all wrong but there is a lot things that are 
different from how I planned. How it looks like (I wrote more in 
documentation)
There is Serializer class that is made of two classes: NativeSerializer 
and FormatSerializer.
NativeSerializer is for serialization and deserialization python objects 
from/to native python datatypes
FormatSerializer is for serialization and deserialization python native 
datatypes to/from some format (xml, json, yaml)


I want NativeSerializer to be fully independent from FormatSerializer 
(and vice versa) but this isn't possible. Either NativeSerializer must 
return some additional data or FormatSerializer must give 
NativeSerializer some context. For exemple in xml all python native 
datatypes must be serialized to string before serializing to xml. Some 
custom model fields can have more sophisticated way to serialize to 
sting than unicode() so `field.value_to_string` must be called and 
`field` are only accessible in NativeSerializer object. So either 
NativeSerializer will return also `field` or FormatSerializer will 
inform NativeSerializer that it handles only text data.


Backward compatible dumpdata is almost working. Only few tests are not 
passed, but I am not sure why.


Nested serialization of fk and m2m related fields which was main 
functionality of this project is working but not well tested. There are 
some issues especially with xml. I must write new xml format because old 
wont work with nested serialization.


I didn't do any performance tests. Running full test suite take 40 
seconds more with my serialization (about 1500s at all) if I remember 
correctly.


I will try to complete this project so it will be at least bug free and 
usable. If someone was interested in using nested serialization there is 
other great project: https://github.com/tomchristie/django-serializers


Code: https://github.com/grapo/django/tree/soc2012-serialization
Documentation: https://gist.github.com/3085250

--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Moving forward with Serialization.

2012-09-01 Thread Piotr Grabowski

W dniu 31.08.2012 10:25, Tom Christie pisze:
> I personally think that Forms are already the place that should 
handle (de)serialisation. They already serialise to HTML: why should 
they not be able to serialise to other stream types?


Conceptually I agree.  As it happens django-serializers is perfectly 
capable of rendering into HTML forms, I just haven't yet gotten around 
to writing a form renderer, since it was out-of-scope of the fixture 
serialization functionality.


Pragmatically, I'm not convinced it'd work very well.  The existing 
Forms implementation is tightly coupled to form-data input and HTML 
output, and I think trying to address that without breaking 
backwards compatibility would be rather difficult.  It's maybe easy 
enough to do for flat representations, and pk relationships, but 
extending it to deal with nested representations, being able to use a 
Form as a field on another Form, and representing custom relationships 
would all take some serious hacking.  My personal opinion is that 
whatever benefits you'd gain in DRYness, you'd lose in code 
complexity.  Having said that, if someone was able to hack together a 
Forms-based fixture serialization/deserialization implementation that 
passes the Django test suite, and didn't look too kludgy, I'd be 
perfectly willing to revise my opinion.


I am not quite sure but I think Forms should be build based on some 
serialization API not opposite. Forms are more precise way of models 
serialization - they are models serialized to html (specific format) 
with some validation (specific actions) when deserializing.



I like Tom's django-serialziers but there are some things that I want to 
mention:


* Process of serialization is split to two parts - transformation to 
python native datatype (serializer) and next to specific text format 
(renderer). But when serializing also Field is saved with data so it's 
not so clean. I also have an issues with this but I resolve it in 
different way (not say better :)


* In master branch Serializer is closely tied to Renderer so if there is 
different Renderer class than new Serializer is needed. In forms branch 
it is done in __init__ serialize method and this must be rewrite for 
backward compatibility if django-serializers goes to core. I want to 
propose my solution [1]:
For each format there is Serializer class which is made from 
NativeSerializer ( from models to python native datatype) and 
FormatSerializer (Renderer)


class Serializer(object):
# class for native python serialization/deserialization
SerializerClass = NativeSerializer
# class for specific format serialization/deserialization
RendererClass = FormatSerializer

def serialize(self, queryset, **options):

def deserialize(self, stream_or_string, **options):

Deserializer = Serializer

This is fully backward compatible and user can do:
serializers.serialize('registered_format', objects, 
serializer=MyNativeSerializer)


This will make new Serializer class with SerializerClass == 
MyNativeSerializer. In this solution NativeSerializer and 
FormatSerializer are more independent. In my solution each 
NativeSerializer can be render by each FormatSerializer but it's not so 
simple. FormatSerializer provide NativeSerializer with some context so 
you can tell that NativeSerializer knows what format will be serialized. 
It's not exactly format but some metadata about it. I am not proud of 
this :/


* IMO there is bug related to xml. All model fields must be transform to 
text before xml serialization. In current django serialization framework 
field's method value_to_string is responsible for this. In 
django-serializers this method is not always called so it can lead to 
error with custom model field


[1] 
https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.