[issue14339] Optimizing bin, oct and hex

2012-04-27 Thread Mark Dickinson

Mark Dickinson  added the comment:

Serhiy:  thanks for submitting the form!  No need to apologise: it was my bad 
for making the commit prematurely.

And now that I see that all-important asterisk next to your name, I'll reclose 
the issue. :-)

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-27 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I sent the signed form. Martin, sorry for the delay. Mark, sorry, that
involuntarily let you down.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-27 Thread STINNER Victor

STINNER Victor  added the comment:

> You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure
> that the newly created string is "consistent" (see the function
> for the details).

Done in the following changeset:

changeset:   76560:3bdcf0cab164
parent:  76558:5fea362b92fc
user:Victor Stinner 
date:Thu Apr 26 00:37:21 2012 +0200
files:   Objects/longobject.c
description:
long_to_decimal_string() and _PyLong_Format() check the consistency of newly
created strings using _PyUnicode_CheckConsistency() in debug mode

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-26 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

Serhiy: what's the status of your contributor form? Note that you can also fax 
it, or scan it and send it by email.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-24 Thread STINNER Victor

STINNER Victor  added the comment:

2.100 -v = PyUnicode_DecodeASCII(p, &buffer[sz] - p, NULL);
...
   2.104 +assert(p == PyUnicode_1BYTE_DATA(v));
   2.105  return v;
   2.106  }

You may call assert(_PyUnicode_CheckConsistency(v, 1)) to ensure that the newly 
created string is "consistent" (see the function for the details).

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I can only do this tomorrow.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-20 Thread Mark Dickinson

Mark Dickinson  added the comment:

Whoops;  that looks like a slightly older version of the form.  I think the 
correct one is:

http://www.python.org/psf/contrib/contrib-form/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-20 Thread Mark Dickinson

Mark Dickinson  added the comment:

Aargh.  Sorry, yes.  Serhiy, can you do this?  The form can be found at:

http://www.python.org/psf/contrib/contrib-form-python/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-20 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

This should have waited until Serhiy submits a contributor form. Serhiy, can 
you please do this soon? Else I'll have to revert the change.

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-20 Thread Mark Dickinson

Mark Dickinson  added the comment:

Thanks for the patch. I made some minor changes, notably moving the overflow 
check closer to where it's needed, moving some comments around, and removing a 
(possibly inappropriate) PyErr_NoMemory call.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-20 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset dcd3344b6d5b by Mark Dickinson in branch 'default':
Issue #14339: Improve speed of bin, oct and hex builtins.  Patch by Serhiy 
Storchaka (with minor modifications).
http://hg.python.org/cpython/rev/dcd3344b6d5b

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-17 Thread Mark Dickinson

Changes by Mark Dickinson :


--
assignee:  -> mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-16 Thread Mark Dickinson

Mark Dickinson  added the comment:

> There is a guarantee that the shortest form must always be used.

Okay, sounds good.  Thanks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-16 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

> (1) The patch appears to assume that a Unicode string created with
> PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND.
> While this might be true in the current implementation, I don't know
> whether this is guaranteed in general.  Martin, any comments on
> this?

There is a guarantee that the shortest form must always be used.
So as long as the Unicode representation doesn't fundamentally change
again, this property is indeed guaranteed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> I think 'x' would be more appropriate. :-)

Oops. You are right, of cause.

--
Added file: http://bugs.python.org/file25234/long_to_binary_base_3.patch

___
Python tracker 

___diff -r 13307eb5bf47 Objects/longobject.c
--- a/Objects/longobject.c  Sat Apr 14 14:20:29 2012 -0500
+++ b/Objects/longobject.c  Mon Apr 16 12:01:24 2012 +0300
@@ -1672,11 +1672,10 @@
 {
 register PyLongObject *a = (PyLongObject *)aa;
 PyObject *v;
-Py_ssize_t i, sz;
+Py_ssize_t sz;
 Py_ssize_t size_a;
-char *p;
-char sign = '\0';
-char *buffer;
+Py_UCS1 *p;
+int negative;
 int bits;
 
 assert(base == 2 || base == 8 || base == 10 || base == 16);
@@ -1688,6 +1687,7 @@
 return NULL;
 }
 size_a = ABS(Py_SIZE(a));
+negative = Py_SIZE(a) < 0;
 
 /* Compute a rough upper bound for the length of the string */
 switch (base) {
@@ -1706,31 +1706,37 @@
 }
 /* compute length of output string: allow 2 characters for prefix and
1 for possible '-' sign. */
-if (size_a > (PY_SSIZE_T_MAX - 3) / PyLong_SHIFT / sizeof(Py_UCS4)) {
+if (size_a > (PY_SSIZE_T_MAX - 3) / PyLong_SHIFT) {
 PyErr_SetString(PyExc_OverflowError,
 "int is too large to format");
 return NULL;
 }
 /* now size_a * PyLong_SHIFT + 3 <= PY_SSIZE_T_MAX, so the RHS below
is safe from overflow */
-sz = 3 + (size_a * PyLong_SHIFT + (bits - 1)) / bits;
-assert(sz >= 0);
-buffer = PyMem_Malloc(sz);
-if (buffer == NULL) {
+if (size_a == 0) {
+sz = 3;
+}
+else {
+sz = 2 + negative + ((size_a - 1) * PyLong_SHIFT +
+ bits_in_digit(a->ob_digit[size_a - 1]) +
+ (bits - 1)) / bits;
+}
+v = PyUnicode_New(sz, 'x');
+if (v == NULL) {
 PyErr_NoMemory();
 return NULL;
 }
-p = &buffer[sz];
-if (Py_SIZE(a) < 0)
-sign = '-';
-
-if (Py_SIZE(a) == 0) {
+assert(PyUnicode_KIND(v) == PyUnicode_1BYTE_KIND);
+p = PyUnicode_1BYTE_DATA(v) + sz;
+
+if (size_a == 0) {
 *--p = '0';
 }
 else {
 /* JRH: special case for power-of-2 bases */
 twodigits accum = 0;
 int accumbits = 0;  /* # of bits in accum */
+Py_ssize_t i;
 for (i = 0; i < size_a; ++i) {
 accum |= (twodigits)a->ob_digit[i] << accumbits;
 accumbits += PyLong_SHIFT;
@@ -1739,7 +1745,7 @@
 char cdigit;
 cdigit = (char)(accum & (base - 1));
 cdigit += (cdigit < 10) ? '0' : 'a'-10;
-assert(p > buffer);
+assert(p > PyUnicode_1BYTE_DATA(v));
 *--p = cdigit;
 accumbits -= bits;
 accum >>= bits;
@@ -1747,6 +1753,7 @@
 }
 }
 
+assert(p == PyUnicode_1BYTE_DATA(v) + 2 + negative);
 if (base == 16)
 *--p = 'x';
 else if (base == 8)
@@ -1754,10 +1761,8 @@
 else /* (base == 2) */
 *--p = 'b';
 *--p = '0';
-if (sign)
-*--p = sign;
-v = PyUnicode_DecodeASCII(p, &buffer[sz] - p, NULL);
-PyMem_Free(buffer);
+if (negative)
+*--p = '-';
 return v;
 }
 
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-16 Thread Mark Dickinson

Mark Dickinson  added the comment:

Thanks for the updates.


> The same assumption is used above in long_to_decimal_string(). I added
> the same assert and changed maxchar to 'f'.

I think 'x' would be more appropriate. :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-15 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> (1) The patch appears to assume that a Unicode string created with 
> PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND.  While this 
> might be true in the current implementation, I don't know whether this is 
> guaranteed in general.  Martin, any comments on this?

The same assumption is used above in long_to_decimal_string(). I added
the same assert and changed maxchar to 'f'.

> (2) The patch doesn't compile with '--with-pydebug':  there's a reference to 
> the (now) undefined variable 'buffer' in one of the asserts.

Fixed.

> (3) The overflow check looks as though it needs to be reworked.

I was left an old constraint (it is enough), but it really can be
weakened (in fact, it can be so weakened in the current code).

Thank you for the comments and the found error.

--
Added file: http://bugs.python.org/file25227/long_to_binary_base_2.patch

___
Python tracker 

___diff -r 13307eb5bf47 Objects/longobject.c
--- a/Objects/longobject.c  Sat Apr 14 14:20:29 2012 -0500
+++ b/Objects/longobject.c  Sun Apr 15 23:29:34 2012 +0300
@@ -1672,11 +1672,10 @@
 {
 register PyLongObject *a = (PyLongObject *)aa;
 PyObject *v;
-Py_ssize_t i, sz;
+Py_ssize_t sz;
 Py_ssize_t size_a;
-char *p;
-char sign = '\0';
-char *buffer;
+Py_UCS1 *p;
+int negative;
 int bits;
 
 assert(base == 2 || base == 8 || base == 10 || base == 16);
@@ -1688,6 +1687,7 @@
 return NULL;
 }
 size_a = ABS(Py_SIZE(a));
+negative = Py_SIZE(a) < 0;
 
 /* Compute a rough upper bound for the length of the string */
 switch (base) {
@@ -1706,31 +1706,37 @@
 }
 /* compute length of output string: allow 2 characters for prefix and
1 for possible '-' sign. */
-if (size_a > (PY_SSIZE_T_MAX - 3) / PyLong_SHIFT / sizeof(Py_UCS4)) {
+if (size_a > (PY_SSIZE_T_MAX - 3) / PyLong_SHIFT) {
 PyErr_SetString(PyExc_OverflowError,
 "int is too large to format");
 return NULL;
 }
 /* now size_a * PyLong_SHIFT + 3 <= PY_SSIZE_T_MAX, so the RHS below
is safe from overflow */
-sz = 3 + (size_a * PyLong_SHIFT + (bits - 1)) / bits;
-assert(sz >= 0);
-buffer = PyMem_Malloc(sz);
-if (buffer == NULL) {
+if (size_a == 0) {
+sz = 3;
+}
+else {
+sz = 2 + negative + ((size_a - 1) * PyLong_SHIFT +
+ bits_in_digit(a->ob_digit[size_a - 1]) +
+ (bits - 1)) / bits;
+}
+v = PyUnicode_New(sz, 'f');
+if (v == NULL) {
 PyErr_NoMemory();
 return NULL;
 }
-p = &buffer[sz];
-if (Py_SIZE(a) < 0)
-sign = '-';
-
-if (Py_SIZE(a) == 0) {
+assert(PyUnicode_KIND(v) == PyUnicode_1BYTE_KIND);
+p = PyUnicode_1BYTE_DATA(v) + sz;
+
+if (size_a == 0) {
 *--p = '0';
 }
 else {
 /* JRH: special case for power-of-2 bases */
 twodigits accum = 0;
 int accumbits = 0;  /* # of bits in accum */
+Py_ssize_t i;
 for (i = 0; i < size_a; ++i) {
 accum |= (twodigits)a->ob_digit[i] << accumbits;
 accumbits += PyLong_SHIFT;
@@ -1739,7 +1745,7 @@
 char cdigit;
 cdigit = (char)(accum & (base - 1));
 cdigit += (cdigit < 10) ? '0' : 'a'-10;
-assert(p > buffer);
+assert(p > PyUnicode_1BYTE_DATA(v));
 *--p = cdigit;
 accumbits -= bits;
 accum >>= bits;
@@ -1747,6 +1753,7 @@
 }
 }
 
+assert(p == PyUnicode_1BYTE_DATA(v) + 2 + negative);
 if (base == 16)
 *--p = 'x';
 else if (base == 8)
@@ -1754,10 +1761,8 @@
 else /* (base == 2) */
 *--p = 'b';
 *--p = '0';
-if (sign)
-*--p = sign;
-v = PyUnicode_DecodeASCII(p, &buffer[sz] - p, NULL);
-PyMem_Free(buffer);
+if (negative)
+*--p = '-';
 return v;
 }
 
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-15 Thread Mark Dickinson

Mark Dickinson  added the comment:

A few comments:

(1) The patch appears to assume that a Unicode string created with 
PyUnicode_New(size, 127) will have 'kind' PyUnicode_1BYTE_KIND.  While this 
might be true in the current implementation, I don't know whether this is 
guaranteed in general.  Martin, any comments on this?

(2) The patch doesn't compile with '--with-pydebug':  there's a reference to 
the (now) undefined variable 'buffer' in one of the asserts.

(3) The overflow check looks as though it needs to be reworked.

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-13 Thread Ezio Melotti

Changes by Ezio Melotti :


--
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-04-01 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

No, I do not think that any application will significantly speeded up.

In fact, it is not optimization, and getting rid of the apparent pessimization. 
In addition to increasing the speed, memory fragmentation is reduced.

The patch has a minimum size, the new code is not larger and not more complex 
than the old one, it is similar to code already used in this file, there are 
small nonconditional advantages. I see no reason not to accept the changes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-03-31 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

While optimizing stuff is nice, I'm not sure there's a use case here. Does it 
significantly speed up a real-world workload you're having?

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-03-16 Thread Eric V. Smith

Changes by Eric V. Smith :


--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-03-16 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
components: +Interpreter Core -Library (Lib)
nosy: +mark.dickinson, skrah
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14339] Optimizing bin, oct and hex

2012-03-16 Thread Serhiy Storchaka

New submission from Serhiy Storchaka :

This patch slightly optimize conversion int to string with base 2, 8 or 16 by 
calculating exactly number of digits and avoiding unnecessary memory 
allocation. The code is similar to used for decimal conversion.

Without patch:
$ ./python -m timeit -s 'x=1' 'bin(x)'
100 loops, best of 3: 0.295 usec per loop
$ ./python -m timeit -s 'x=1' 'oct(x)'
100 loops, best of 3: 0.327 usec per loop
$ ./python -m timeit -s 'x=1' 'hex(x)'
100 loops, best of 3: 0.298 usec per loop
$ ./python -m timeit -s 'x=9**999' 'bin(x)'
10 loops, best of 3: 11.9 usec per loop
$ ./python -m timeit -s 'x=9**999' 'oct(x)'
10 loops, best of 3: 4.55 usec per loop
$ ./python -m timeit -s 'x=9**999' 'hex(x)'
10 loops, best of 3: 3.99 usec per loop

With patch:
$ ./python -m timeit -s 'x=1' 'bin(x)'
100 loops, best of 3: 0.213 usec per loop
$ ./python -m timeit -s 'x=1' 'oct(x)'
100 loops, best of 3: 0.228 usec per loop
$ ./python -m timeit -s 'x=1' 'hex(x)'
100 loops, best of 3: 0.215 usec per loop
$ ./python -m timeit -s 'x=9**999' 'bin(x)'
10 loops, best of 3: 9.76 usec per loop
$ ./python -m timeit -s 'x=9**999' 'oct(x)'
10 loops, best of 3: 3.58 usec per loop
$ ./python -m timeit -s 'x=9**999' 'hex(x)'
10 loops, best of 3: 3.3 usec per loop

--
components: Library (Lib)
files: long_to_binary_base.patch
keywords: patch
messages: 156085
nosy: storchaka
priority: normal
severity: normal
status: open
title: Optimizing bin, oct and hex
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file24891/long_to_binary_base.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com