Martin Morrison added the comment: > Using a trick with struct.unpack() has very unpleasant side effect. > It might be a few speed up encoding, but creates the Struct object > with the size is many times larger than the size of the processed > data. Worse, this object is cached and continues to consume memory. > Since the size of the data most likely will be unique, almost every > call of b85encode creates a new object. This will lead to memory > leaks.
Can you elaborate on this? What leakage is there? I assume this is some implementation quirk of the struct module that I'm not aware of. > Le mercredi 17 avril 2013 à 18:14 +0000, Serhiy Storchaka a écrit : >> I think we can provide a universal solution compatible (with some >> pre/postprocessing) with both variants. Enclose encoded data in <~ >> and ~> or not, and at which column wrap an encoded data. Padding >> can be easy implemented as preprocessing (data + (-len(data)) % 4 * >> b'\0'). > > That's ok with me. It's just more work for whoever does it :-) As I mentioned in one of my previous comments, I was trying very hard not to touch the Mercurial solution (b85(en|de)code in the latest patch), and just copy it wholesale. Mostly, I don't really like the way the solution reads (unpythonic in my eyes), but can understand that for this kind of thing that might be the best way. In my solution (a85(en|de)code) I wrote it from scratch in what I felt was a readable way. I can quite easily extend my version to support your description of the btoa/atob version (i.e. no bracketing, always pad, always wrap output). I'm less convinced it's sensible to merge the ascii85 implementations and the Mercurial b85 one. If you really want that though, I would be in favour of using my a85 implementation and just changing the encode inner function to use the lookup table. (we can do all this independently of the function names, which I think Antoine and I are agreed should be separate for the different implementations) >> As for Git/Mercurial's base85, what other applications use this >> encoding? > > I don't know, but they use it to produce binary diffs ("diff" chunks > of binary files), so any application wanting to parse Mercurial/Git > diffs would have to recognize base85 data. > > (and I also like that the Mercurial/Git variant is the simpler of > all 3 :-)) I actually prefer the Ascii85 one for the simplicity of the encoding (shift base 85 chunks of the input by 33 to get into the printable ascii range) rather than the clunky lookup table approach. À chacun son goût. :-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue17618> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com