[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-23 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

The proposed patch optimizes _PyUnicode_FindMaxChar(). This affects string 
formatting of long patterns (speedup to 15-25% for classic formatting and 5-8% 
for new style formatting).

--
components: Interpreter Core, Unicode
files: unicode_findmaxchar.patch
keywords: patch
messages: 177997
nosy: ezio.melotti, haypo, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Faster _PyUnicode_FindMaxChar()
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file28407/unicode_findmaxchar.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-23 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Added file: http://bugs.python.org/file28408/format_bench.sh

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-24 Thread STINNER Victor

STINNER Victor added the comment:

See issues #14687 and #14716 for benchmarks and information about the changes 
introducing and using _PyUnicodeWriter for str%args and str.format(args).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-24 Thread STINNER Victor

STINNER Victor added the comment:

Avoid scanning a substring to compute the maximum character is a good thing, so 
+1 for the idea. Instead of modifying an existing function, it might be safer 
to rename it. Even if it is private, a third party module *can* use it outside 
Python. It is also surprising that you have to specifiy a "maxchar" argument 
whereas the purpose of the function is to compute the maximum character.

Ex: rename _PyUnicode_FindMaxChar() to _PyUnicode_FindMaxChar2(), and add 
_PyUnicode_FindMaxChar() as the macro:

#define _PyUnicode_FindMaxChar(str, start, length) _PyUnicode_FindMaxChar2(str, 
start, length, 127)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It is not more surprising that specify a "start" argument for sum().

I don't think we should worry about third party module which violate the API. 
It will crash in any case when the exported function will disappear and will be 
replaced by macros.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-24 Thread STINNER Victor

STINNER Victor added the comment:

> It is not more surprising that specify a "start" argument for sum().

Un pythin, the start attribute of sum is optional.

> I don't think we should worry about third party module which violate the
API. It will crash in any case when the exported function will disappear
and will be replaced by macros.

The compilation will fail, whereas it will crash with your patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2012-12-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think it is redundant and useless, but do as you want.

--
Added file: http://bugs.python.org/file28428/unicode_findmaxchar_2.patch

___
Python tracker 

___diff -r a7c9869a5114 Include/unicodeobject.h
--- a/Include/unicodeobject.h   Mon Dec 24 13:17:11 2012 +0200
+++ b/Include/unicodeobject.h   Mon Dec 24 22:52:17 2012 +0200
@@ -739,10 +739,15 @@
 #ifndef Py_LIMITED_API
 /* Compute the maximum character of the substring unicode[start:end].
Return 127 for an empty string. */
-PyAPI_FUNC(Py_UCS4) _PyUnicode_FindMaxChar (
+#define _PyUnicode_FindMaxChar(unicode, start, end) \
+_PyUnicode_FindMaxChar2((unicode), (start), (end), 127)
+/* Compute the maximum character of the substring unicode[start:end] and
+   maxchar. */
+PyAPI_FUNC(Py_UCS4) _PyUnicode_FindMaxChar2(
 PyObject *unicode,
 Py_ssize_t start,
-Py_ssize_t end);
+Py_ssize_t end,
+Py_UCS4 maxchar);
 #endif
 
 /* Copy the string into a UCS4 buffer including the null character if copy_null
diff -r a7c9869a5114 Objects/stringlib/unicode_format.h
--- a/Objects/stringlib/unicode_format.hMon Dec 24 13:17:11 2012 +0200
+++ b/Objects/stringlib/unicode_format.hMon Dec 24 22:52:17 2012 +0200
@@ -886,8 +886,9 @@
  &format_spec_needs_expanding)) == 2) {
 sublen = literal.end - literal.start;
 if (sublen) {
-maxchar = _PyUnicode_FindMaxChar(literal.str,
- literal.start, literal.end);
+maxchar = _PyUnicode_FindMaxChar2(literal.str,
+  literal.start, literal.end,
+  writer->maxchar);
 err = _PyUnicodeWriter_Prepare(writer, sublen, maxchar);
 if (err == -1)
 return 0;
diff -r a7c9869a5114 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c   Mon Dec 24 13:17:11 2012 +0200
+++ b/Objects/unicodeobject.c   Mon Dec 24 22:52:17 2012 +0200
@@ -2002,24 +2002,31 @@
 }
 
 Py_UCS4
-_PyUnicode_FindMaxChar(PyObject *unicode, Py_ssize_t start, Py_ssize_t end)
+_PyUnicode_FindMaxChar2(PyObject *unicode, Py_ssize_t start, Py_ssize_t end,
+Py_UCS4 maxchar)
 {
 enum PyUnicode_Kind kind;
 void *startptr, *endptr;
+Py_UCS4 maxchar2;
 
 assert(PyUnicode_IS_READY(unicode));
 assert(0 <= start);
 assert(end <= PyUnicode_GET_LENGTH(unicode));
 assert(start <= end);
 
-if (start == 0 && end == PyUnicode_GET_LENGTH(unicode))
-return PyUnicode_MAX_CHAR_VALUE(unicode);
+if (start == 0 && end == PyUnicode_GET_LENGTH(unicode)) {
+maxchar2 = PyUnicode_MAX_CHAR_VALUE(unicode);
+return MAX_MAXCHAR(maxchar, maxchar2);
+}
 
 if (start == end)
-return 127;
+return maxchar;
 
 if (PyUnicode_IS_ASCII(unicode))
-return 127;
+return maxchar;
+
+if (maxchar >= PyUnicode_MAX_CHAR_VALUE(unicode))
+return maxchar;
 
 kind = PyUnicode_KIND(unicode);
 startptr = PyUnicode_DATA(unicode);
@@ -2027,15 +2034,19 @@
 startptr = (char *)startptr + start * kind;
 switch(kind) {
 case PyUnicode_1BYTE_KIND:
-return ucs1lib_find_max_char(startptr, endptr);
+maxchar2 = ucs1lib_find_max_char(startptr, endptr);
+break;
 case PyUnicode_2BYTE_KIND:
-return ucs2lib_find_max_char(startptr, endptr);
+maxchar2 = ucs2lib_find_max_char(startptr, endptr);
+break;
 case PyUnicode_4BYTE_KIND:
-return ucs4lib_find_max_char(startptr, endptr);
+maxchar2 = ucs4lib_find_max_char(startptr, endptr);
+break;
 default:
 assert(0);
 return 0;
 }
+return MAX_MAXCHAR(maxchar, maxchar2);
 }
 
 /* Ensure that a string uses the most efficient storage, if it is not the
@@ -13740,7 +13751,7 @@
 Py_ssize_t pindex;
 Py_UCS4 signchar;
 Py_ssize_t buflen;
-Py_UCS4 maxchar, bufmaxchar;
+Py_UCS4 bufmaxchar;
 Py_ssize_t sublen;
 _PyUnicodeWriter *writer = &ctx->writer;
 Py_UCS4 fill;
@@ -13793,7 +13804,7 @@
 arg->width = len;
 
 /* Prepare the writer */
-bufmaxchar = 127;
+bufmaxchar = writer->maxchar;
 if (!(arg->flags & F_LJUST)) {
 if (arg->sign) {
 if ((arg->width-1) > len)
@@ -13804,8 +13815,7 @@
 bufmaxchar = MAX_MAXCHAR(bufmaxchar, fill);
 }
 }
-maxchar = _PyUnicode_FindMaxChar(str, 0, pindex+len);
-bufmaxchar = MAX_MAXCHAR(bufmaxchar, maxchar);
+bufmaxchar = _PyUnicode_FindMaxChar2(str, 0, pindex+len, bufmaxchar);
 buflen = arg->width;
 if (arg->sign && len == arg->width)
 buflen++;
@@ -13975,8 +13985,9 @@
 ctx.writer.overallocate = 0;
 }
 sublen = ctx.fmtpos - nonfmtpos;
- 

[issue16757] Faster _PyUnicode_FindMaxChar()

2013-04-02 Thread STINNER Victor

STINNER Victor added the comment:

Related commit:

changeset:   83066:b5d5f422299f
tag: tip
user:Victor Stinner 
date:Wed Apr 03 01:48:39 2013 +0200
files:   Include/unicodeobject.h Lib/test/test_format.py 
Objects/stringlib/unicode_format.h Objects/unicodeobject.c
description:
Add _PyUnicodeWriter_WriteSubstring() function

Write a function to enable more optimizations:

 * If the substring is the whole string and overallocation is disabled, just
   keep a reference to the string, don't copy characters
 * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
   possible

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2013-04-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 7132bca093ad by Victor Stinner in branch 'default':
Close #16757: Avoid calling the expensive _PyUnicode_FindMaxChar() function
http://hg.python.org/cpython/rev/7132bca093ad

--
nosy: +python-dev
resolution:  -> fixed
stage: patch review -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16757] Faster _PyUnicode_FindMaxChar()

2013-04-02 Thread STINNER Victor

STINNER Victor added the comment:

> New changeset 7132bca093ad by Victor Stinner in branch 'default':
> Close #16757: Avoid calling the expensive _PyUnicode_FindMaxChar() function

This changeset is almost the same than unicode_findmaxchar_2.patch. I prefered 
to keep the API unchanged and not call _PyUnicode_FindMaxChar() instead of 
adding a test in the function to exit it.

There is just a minor difference in Python/formatter_unicode.c: the test for 
_PyUnicode_FindMaxChar() is done after the test on format->fill_char (which 
should avoid a call to for _PyUnicode_FindMaxChar() if fill_char is wider than 
the actual buffer).

Thanks Serhiy for your great idea!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com