On Mon, Dec 13, 2010 at 09:55:18PM +0800, Jeremy Kerr wrote: > Hi Eduardo, > > > Why not just handle and store the patch as an array of bytes (Python > > 'str' type) instead of a unicode string? > > Basically, because we need to process the patch itself; either extracting it > from the email message, or for finding the hash. Both of these require > looking > into the content of the patch, which means we need to be able to decode it.
I don't get it. You can process and look into a byte array as easily as you can process a unicode string. patch(1) operates at byte level, it doesn't care about unicode and character encoding. It just get a description of byte-level changes to source files. So we don't need to pretend that every diff is going to be valid unicode. I understand it is hard to change this on Patchwork today, though. It would affect the database models and the xmlrpc interface. > > > The restriction that every patch should be valid unicode makes it > > impossible to patch existing source files that already have non-utf8 > > data inside them (I suppose this includes source trees where files are > > encoded as iso-8859-1, as the unicode diff won't be encoded back to the > > original encoding when exporting the patches from Patchwork, will it?). > > > > This would require changing the database model and xmlrpc API to use > > binary data (I hope Django support it) > > no, django doesn't support it out of the box, I believe this is a django > design decision. Ouch. I understand that discouraging storing binary data is a good thing, but I didn't expect Django to simply not allow it. -- Eduardo _______________________________________________ Patchwork mailing list [email protected] https://lists.ozlabs.org/listinfo/patchwork
