[issue17618] base85 encoding

2014-03-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1853679c6f71 by R David Murray in branch 'default':
whatsnew: base65 encodings. (#17618)
http://hg.python.org/cpython/rev/1853679c6f71

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Now committed, thanks for the reviews and the code!

--
resolution:  -> fixed
stage: needs patch -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 42366e293b7b by Antoine Pitrou in branch 'default':
Issue #17618: Add Base85 and Ascii85 encoding/decoding to the base64 module.
http://hg.python.org/cpython/rev/42366e293b7b

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yet one nitpick and the patch LGTM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Updated patch after Serhiy's comments.

--
Added file: http://bugs.python.org/file32672/base85-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Grr.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> I added more comments on Rietveld. 

Did you forget to publish them?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I added more comments on Rietveld.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Updated patch incorporating Serhiy's self-review from 6 months ago (grr).

--
Added file: http://bugs.python.org/file32661/base85-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Updated patch with suggested API changes, + docs.

--
Added file: http://bugs.python.org/file32659/base85.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-10-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Well, I think the following comments (Serhiy's) should be implemented:

"""As for interface, I think 'adobe' flag should be false by default. It makes 
encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor add 
Adobe's brackets. btoa/atob functions looks redundant as we can just use 
a85encode/a85decoder with appropriate options."""

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-10-06 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-10-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm not very interesting in working on this (but analyzing and optimizing made 
fun to me). You Antoine as originator definitely are interested. So make 
decision about interface which you need and finish the work using proposed 
patches as a basis. I would made a review.

I'm a little doubt about appropriateness base85 codec in the base64 module 
("This module provides data encoding and decoding as specified in RFC 3548."). 
Base85 is not standard. But I don't see better place for it. At least the 
description of the base64 module should be corrected.

I suggest first resolve issue16995. Perhaps it will get suggestions about 
base85 interface.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-10-05 Thread Jason Stokes

Jason Stokes added the comment:

What issues are there with the implementation as it stands? I am happy to 
contribute (as I need to code a base36 implementation myself, and it's 
basically the same work) but it looks like the existing implementation is fine, 
except possibly some people don't like "adobe" being implemented by default?

--
nosy: +glasper

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-09-24 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Serhiy, Martin, is one of you still working on this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> The problem with autodetecting is that it makes it impossible for an
> application to use this library to verify that something is encoded in 
> a specific way. Explicit is better than implicit. 

Agreed. Also, you generally known what format your data is in. Otherwise, how 
do you know that it is base85 rather than, say, base64 or uuencode?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-21 Thread Martin Morrison

Martin Morrison added the comment:

On 21 Apr 2013, at 17:38, Serhiy Storchaka  wrote:
> Serhiy Storchaka added the comment:
> 
> As for interface, I think 'adobe' flag should be false by default. It makes 
> encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor 
> add Adobe's brackets. btoa/atob functions looks redundant as we can just use 
> a85encode/a85decoder with appropriate options. Perhaps we should get rid from 
> 'adobe' flag in a85decode and autodetect it. And perhaps to do the same with 
> other a85decode's options.

The problem with autodetecting is that it makes it impossible for an 
application to use this library to verify that something is encoded in a 
specific way. Explicit is better than implicit. 

Otherwise, your changes look good to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

As for interface, I think 'adobe' flag should be false by default. It makes 
encoder simpler. ascii85 encoder in Go's standard library doesn't wrap nor add 
Adobe's brackets. btoa/atob functions looks redundant as we can just use 
a85encode/a85decoder with appropriate options. Perhaps we should get rid from 
'adobe' flag in a85decode and autodetect it. And perhaps to do the same with 
other a85decode's options.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There are some bugs in ascii85 end base85 implementations (see in Rietveld for 
details). Besides, ascii85 implementation was too slow. I've prepared a patch 
that corrects errors and speeds up encoding and decoding.

Microbenchmarks:

./python -m timeit -r 1 -n 1 -s "from base64 import a85encode as encode; data = 
open('python', 'rb').read(101)"  "encode(data)"
./python -m timeit -r 1 -n 1 -s "from base64 import b85encode as encode; data = 
open('python', 'rb').read(101)"  "encode(data)"
./python -m timeit -r 1 -n 1 -s "from base64 import a85encode as encode, 
a85decode as decode; data = encode(open('python', 'rb').read(101))"  
"decode(data)"
./python -m timeit -r 1 -n 1 -s "from base64 import b85encode as encode, 
b85decode as decode; data = encode(open('python', 'rb').read(101))"  
"decode(data)"

   Old patch  New patch
a85encode   8.4 sec1.13 sec
b85encode   1.35 sec   1.09 sec
a85decode   9.28 sec   3.29 sec
b85decode   3.17 sec   2.37 sec

--
Added file: http://bugs.python.org/file29956/issue17618-fast.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Added file: http://bugs.python.org/file29942/issue17618-5.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-18 Thread Martin Morrison

Martin Morrison added the comment:

Attached a minor tweak over the last diff - I'd forgotten to fix the Struct 
handling inside the Mercurial implementation as well.

All other comments still apply to this diff.

--
Added file: http://bugs.python.org/file29930/issue17618-5.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-18 Thread Martin Morrison

Martin Morrison added the comment:

Raised http://bz.selenic.com/show_bug.cgi?id=3894 against Mercurial for them to 
workaround issue14596.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison

Martin Morrison added the comment:

New diff. Changes from the last one:

- change in struct handling to avoid issue14596

- Addition of btoa85 and atob85 functions that do legacy 'btoa' 
encoding/decoding. These are just wrappers around a85(en|de)code, which now 
have additional keyword args to control wrapping, padding, framing, and 
whitespace skipping

- New tests covering all 3 variants

--
Added file: http://bugs.python.org/file29911/issue17618-4.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Serhiy, Martin, perhaps one of you could report the potential memory leak on 
the Mercurial bug tracker: http://bz.selenic.com/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison

Martin Morrison added the comment:

>> Can you elaborate on this? What leakage is there? I assume this is some
> implementation quirk of the struct module that I'm not aware of.
>
> issue14596.

Thanks for the pointer. I will rework the patch for the encoder/decoders 
to use an explicit Struct so that the inbuilt cache gets bypassed and we 
don't "leak",

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Can you elaborate on this? What leakage is there? I assume this is some 
implementation quirk of the struct module that I'm not aware of.

issue14596.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Martin Morrison

Martin Morrison added the comment:

> Using a trick with struct.unpack() has very unpleasant side effect.
> It might be a few speed up encoding, but creates the Struct object
> with the size is many times larger than the size of the processed
> data. Worse, this object is cached and continues to consume memory.
> Since the size of the data most likely will be unique, almost every
> call of b85encode creates a new object. This will lead to memory
> leaks.

Can you elaborate on this? What leakage is there? I assume this is some 
implementation quirk of the struct module that I'm not aware of.

> Le mercredi 17 avril 2013 à 18:14 +, Serhiy Storchaka a écrit :
>> I think we can provide a universal solution compatible (with some
>> pre/postprocessing) with both variants. Enclose encoded data in <~
>> and ~> or not, and at which column wrap an encoded data. Padding
>> can be easy implemented as preprocessing (data + (-len(data)) % 4 *
>> b'\0').
>
> That's ok with me. It's just more work for whoever does it :-)

As I mentioned in one of my previous comments, I was trying very hard 
not to touch the Mercurial solution (b85(en|de)code in the latest 
patch), and just copy it wholesale. Mostly, I don't really like the way 
the solution reads (unpythonic in my eyes), but can understand that for 
this kind of thing that might be the best way.

In my solution (a85(en|de)code) I wrote it from scratch in what I felt 
was a readable way. I can quite easily extend my version to support your 
description of the btoa/atob version (i.e. no bracketing, always pad, 
always wrap output).

I'm less convinced it's sensible to merge the ascii85 implementations 
and the Mercurial b85 one. If you really want that though, I would be in 
favour of using my a85 implementation and just changing the encode inner 
function to use the lookup table.

(we can do all this independently of the function names, which I think 
Antoine and I are agreed should be separate for the different 
implementations)

>> As for Git/Mercurial's base85, what other applications use this
>> encoding?
>
> I don't know, but they use it to produce binary diffs ("diff" chunks
> of binary files), so any application wanting to parse Mercurial/Git
> diffs would have to recognize base85 data.
>
> (and I also like that the Mercurial/Git variant is the simpler of
> all 3 :-))

I actually prefer the Ascii85 one for the simplicity of the encoding 
(shift base 85 chunks of the input by 33 to get into the printable ascii 
range) rather than the clunky lookup table approach. À chacun son goût. :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Le mercredi 17 avril 2013 à 18:14 +, Serhiy Storchaka a écrit :
> I think we can provide a universal solution compatible (with some
> pre/postprocessing) with both variants. Enclose encoded data in <~ and
> ~> or not, and at which column wrap an encoded data. Padding can be
> easy implemented as preprocessing (data + (-len(data)) % 4 * b'\0').

That's ok with me. It's just more work for whoever does it :-)

> As for Git/Mercurial's base85, what other applications use this
> encoding?

I don't know, but they use it to produce binary diffs ("diff" chunks of
binary files), so any application wanting to parse Mercurial/Git diffs
would have to recognize base85 data.

(and I also like that the Mercurial/Git variant is the simpler of all
3 :-))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> btoa/atob seems extinct.

At least half of ascii85 encoders in wild implement this variant.

I think we can provide a universal solution compatible (with some 
pre/postprocessing) with both variants. Enclose encoded data in <~ and ~> or 
not, and at which column wrap an encoded data. Padding can be easy implemented 
as preprocessing (data + (-len(data)) % 4 * b'\0').

As for Git/Mercurial's base85, what other applications use this encoding?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> After searching a lot of other implementations of this encoding I
> conclude that there are at least three different variants.

Yes. The current proposal is to include both the Adobe version ("ascii85")
and the Mercurial/Git version ("base85"). btoa/atob seems extinct.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

After searching a lot of other implementations of this encoding I conclude that 
there are at least three different variants.

1. The original btoa/atob encoding. 4 zeros are packaged as 'z', last 
incomplete 4 bytes are padded by zeros, an output is wrapped into several lines 
and decoder ignores '\n'. There are many implementations of this algorithm in 
different languages.

2. Adobe version. This is an extended version of (1). The last incomplete 4 
bytes produces less then 5 output characters, an output is enclosed in <~ and 
~>. Decoder ignores all ascii whitespaces, not only '\n'. There are many 
implementations of this algorithm in different languages.

3. Git and Mercurial version. This is a very simplified version of (1) with an 
alternative character set. Zeros are not packed, an output is not broken into 
several lines and decoder doesn't ignores any whitespaces. I don't know is 
whether this variant used besides Git and Mercurial.

Some implementations combine (1) and (2) (optionally enclose an output in <~ 
and ~>, optionally wrap an output into several lines, optionally pad last 4 
incomplete bytes).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

After a more careful look of the b85encode code I say that it's implementation 
is not optimal. For the sake of simplicity the entire volume of data is copied 
several times. This can affect the processing of a large volume of data. On 
other hand, this dumb copying can be faster then more smart processing in 
a85encode. Only benchmarks will show the truth.

Using a trick with struct.unpack() has very unpleasant side effect. It might be 
a few speed up encoding, but creates the Struct object with the size is many 
times larger than the size of the processed data. Worse, this object is cached 
and continues to consume memory. Since the size of the data most likely will be 
unique, almost every call of b85encode creates a new object. This will lead to 
memory leaks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-14 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Hi and thanks for the patch!

> I named the Mercurial base85 implementation functions with the "b85"
> prefix. For the Ascii85 ones, I used "a85". I considered overloading
> the same functions with a keyword argument to select which encoding,
> but rejected that. Thoughts?

I agree, it's better like this.

> I haven't made the changes to add a pure Python binascii module as
> suggested in msg186216. Although poorly named, "base64.py" already contains 
> a number of other encodings, so this seemed the best place for these too.

Yes, I think it's ok. I was thinking about binascii in the context of making a 
C version, but we can refactor things later anyway.

Now about the patch: I haven't read it in detail, but it seems to lack tests 
for b85decode and b85encode. It should be easy enough to get some test values 
by calling Mercurial's version.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-14 Thread Martin Morrison

Martin Morrison added the comment:

I've updated the Ascii85 algorithms to remove the quadratic complexity, and use 
a single struct.pack/unpack. They should now be much quicker for large input 
strings.

It's difficult to factor out commonality with b85* because the encodings and 
rules differ. This is especially true for decode (where Ascii85 allows 
arbitrary whitespace, so it either has to be stepped through as I've 
implemented it, or it would have to first be sanitised with .replace() or 
similar, which is expensive for large inputs). For encode, the special cases 
supported by Ascii85 make it impossible to *just* use a lookup table, and the 
simplified algorithm for encoding means it isn't necessary to use one at all. I 
also wanted to keep the Mercurial code intact as much as possible, so it can be 
kept in sync in future if necessary.

My notes from the previous diff also still apply if anyone has thoughts on 
those.

--
Added file: http://bugs.python.org/file29852/issue17618-3.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I want to see both algorithms to be similar so far as it is possible. It might 
be worth extract and reuse a common code. Mercurial's code looks far more 
optimal (for example a85encode has a quadratic complexity in result 
accumulating).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-13 Thread Martin Morrison

Martin Morrison added the comment:

Updated patch that includes both my original implementation of Ascii85, as well 
as the Mercurial implementation of base85. A few notes/questions:

- I named the Mercurial base85 implementation functions with the "b85" prefix. 
For the Ascii85 ones, I used "a85". I considered overloading the same functions 
with a keyword argument to select which encoding, but rejected that. Thoughts?

- I made only minor modifications to the Mercurial code to use bytes throughout 
(to match all the other APIs in the module). I also updated the docstrings a 
bit. My goal was to change as little as possible to guarantee identical 
behaviour.

- I haven't made the changes to add a pure Python binascii module as suggested 
in msg186216. Although poorly named, "base64.py" already contains a number of 
other encodings, so this seemed the best place for these too. I'm happy to make 
that change as well though if you really want it as part of this issue.

--
Added file: http://bugs.python.org/file29838/issue17618-2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-13 Thread Martin Morrison

Martin Morrison added the comment:

Ok, great. I'll update the patch to include both encoding schemes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

For the record, Mads and Brendan have submitted a contributor's agreement, so 
we can now take what we want from Mercurial's base85.py (which you can find at 
http://selenic.com/hg/file/4e1ae55e63ef/mercurial/pure/base85.py).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> So I'm not sure what you want to do. I would suggest a standard
> Ascii85 encoder is definitely useful, and provides feature parity with
> Ruby. If we want the standard library to be able to read/write
> Mercurial/Git base64 encoded files, then I guess that can be added
> too. If we think RFC1924 is useful/used, then the implementation in
> the netaddr lib looks right.

Agreed for both the Ascii85 encoder and the hg/git brand of base85
(which is used for "binary diffs", by the way). I don't think supporting
RFC1924 is useful, though.

(I think using "ascii85" and "base85" for those encodings, respectively,
provides a nice way to distinguish them)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison

Martin Morrison added the comment:

Ok, I'm not even sure that Mercurial follows RFC1924! That RFC is specifically 
for encoding IPv6 addresses, and mandates that the calculations be performed on 
a 128bit integer.

The Mercurial implementation seems to follow the Ascii85 policy of taking each 
4 bytes separately and doing 32bit arithmetic, but uses the lookup table from 
RFC1924, and is less lenient about spacing, and has no compression for 
sequences of zeroes.

It therefore looks like Mercurial (and I guess Git) have their own, 
non-standard base64 encoding. The Web at large mostly has "standard" Ascii85 
encoding/decoding described. RFC1924 itself has a Python implementation on 
Github:

https://github.com/drkjam/netaddr/blob/rel-0.7.x/netaddr/ip/rfc1924.py

So I'm not sure what you want to do. I would suggest a standard Ascii85 encoder 
is definitely useful, and provides feature parity with Ruby. If we want the 
standard library to be able to read/write Mercurial/Git base64 encoded files, 
then I guess that can be added too. If we think RFC1924 is useful/used, then 
the implementation in the netaddr lib looks right.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> The Ascii85 version is what is used with PDF, and the default output
> format for the equivalent Ruby library, so seems useful to have. So I
> guess what might be desirable is to have both in the codebase?

Yes, it could be useful to have both.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison

Martin Morrison added the comment:

Ok, having now looked at mercurial's implementation... it looks like they 
implemented the RFC1924 version, whereas my implementation is the Ascii85 
version (and I verified it against, amongst others: 
http://www.tools4noobs.com/online_tools/ascii85_encode/ ).

The Ascii85 version is what is used with PDF, and the default output format for 
the equivalent Ruby library, so seems useful to have. So I guess what might be 
desirable is to have both in the codebase?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> Forgot to mention, I included an optional keyword argument to support
> the 'btoa' shortcut for sequences of space characters as described in
> the Wikipedia article. However, I'm unsure if any other implementation
> supports this, so might not be worth including it in our
> implementation.

In this issue I would really like to aim for Mercurial/git-like
behaviour: i.e. no special shortcuts, and no armoring ('<~' and '~>').
Also, the base85 alphabet used by Mercurial and git may be different, I
haven't checked.

(by the way, it seems "btoa" has been dead for a long time, I don't
think it's useful as a reference here)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison

Martin Morrison added the comment:

(sorry for spam)

Forgot to mention, I included an optional keyword argument to support the 
'btoa' shortcut for sequences of space characters as described in the Wikipedia 
article. However, I'm unsure if any other implementation supports this, so 
might not be worth including it in our implementation.

A better feature might be to support full btoa output, but the Wikipedia 
article doesn't have a complete enough specification, and I couldn't find 
(didn't really look for) one elsewhere. If no one uses it though, again. 
probably not worth including it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Martin Morrison

Martin Morrison added the comment:

I wrote an implementation from scratch (based on the wikipedia article; I've 
not looked at any existing implementations) in pure Python in the attached 
diff. It includes tests.

Feel free to use it as the pure Python implementation if desired, though I 
won't be offended if you just end up using the Mercurial one. :-)

--
keywords: +patch
nosy: +isoschiz
Added file: http://bugs.python.org/file29717/issue17618.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The Mercurial authors have given their informal agreement for a relicensing. 
OTOH, they must still sign a contributor's agreement. The relicensing would 
allow us to use their pure Python implementation (in mercurial/pure/base85.py). 
OTOH, the C implementation (in mercurial/base85.c) is a ripoff of the git one, 
so we'd better rewrite our own.

My current plan would be the following:
- create a binascii.py and rename binascii.c to _binascii.c
- add Mercurial's pure Python implementation to binascii.py
- add a custom C implementation to _binascii.c
- make sure the binascii test suite tests both implementations

OTOH, if we don't get the Mercurial authors' contributor agreement, we can also 
re-write our own pure Python implementation.

In any case, our implementation should IMHO be compatible with Mercurial's 
(i.e. produce identical outputs for the same inputs).

--
stage:  -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread R. David Murray

R. David Murray added the comment:

Antoine is talking to Mercurial about relicensing, and I believe at this point 
it is just a matter of working out the mechanical details (that is, he has an 
agreement-in-principal from them).

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-07 Thread Sijin Joseph

Sijin Joseph added the comment:

Is anyone working on this? I'd like to include this in a CPython sprint @MIT on 
4/13.

--
nosy: +sijinjoseph

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-02 Thread Jesús Cea Avión

Changes by Jesús Cea Avión :


--
keywords: +easy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-02 Thread Jesús Cea Avión

Changes by Jesús Cea Avión :


--
nosy: +jcea

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-02 Thread Florent Xicluna

Changes by Florent Xicluna :


--
nosy: +flox

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17618] base85 encoding

2013-04-02 Thread Antoine Pitrou

New submission from Antoine Pitrou:

Base85 encoding (see e.g. http://en.wikipedia.org/wiki/Ascii85 ) allows a 
tighter encoding than Base64: it has a 5/4 expansion ratio, rather than 4/3.
It is used in Mercurial, git, and there's another variant that's used by Adobe 
in the PDF format.

It would be nice to have a Base85 implementation in either the binascii or 
base64 modules.

(unfortunately the Mercurial implementation is GPL'ed, although if we want to 
copy it we might simply ask them for a relicensing)

--
components: Library (Lib)
messages: 185835
nosy: christian.heimes, pitrou
priority: normal
severity: normal
status: open
title: base85 encoding
type: enhancement
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com