[issue13136] speed-up conversion between unicode widths

Meador Inge Sat, 08 Oct 2011 17:49:58 -0700

Meador Inge <mead...@gmail.com> added the comment:

On Sat, Oct 8, 2011 at 5:34 PM, Antoine Pitrou <rep...@bugs.python.org> wrote:


> Antoine Pitrou <pit...@free.fr> added the comment:
>
>> Before going further with this, I'd suggest you have a look at your
>> compiler settings.
>
> They are set by the configure script:
>
> gcc -pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall
> -Wstrict-prototypes    -I. -I./Include    -DPy_BUILD_CORE -o
> Objects/unicodeobject.o Objects/unicodeobject.c
>
>> Such optimizations are normally performed by the
>> compiler and don't need to be implemented in C, making maintenance
>> harder.
>
> The fact that the glibc includes such optimization (in much more
> sophisticated form) suggests to me that many compilers don't perform
> these optimizations automically.

I agree.  This is more of an optimized runtime library problem than
code optimization problem.

>> I tested using memchr() when writing those "naive" loops.
>
> memchr() is mentioned in another issue, #13134.

Yeah, this conversation is really more relevant to issue13134, but I will
reply to these here anyway ....

>> memchr()
>> is inlined by the compiler just like the direct loop
>
> I don't think so. If you look at the glibc's memchr() implementation,
> it's a sophisticated routine, not a trivial loop. Perhaps you're
> thinking about memcpy().

Without link-time optimization enabled, I doubt the toolchain can
"inline" 'memchr'
in the traditional sense (i.e. inserting the body of the routine
inline).  Even if it could,
the inline heuristics would most likely choose not to.  I don't think we use LTO
with GCC, but I think we might with VC++.

GCC does something else called builtin folding that is more likely.
For example,
'memchr ("bca", 'c', 3)' gets replace with instructions to compute a pointer
index into "bca".  This won't happen in this case because all of the 'memchr'
arguments are all variable.

>> and the generated
>> code for the direct version is often easier to optimize for the compiler
>> than the memchr() one, since it receives more knowledge about the used
>> data types.
>
> ?? Data types are fixed in the memchr() definition, there's no knowledge
> to be gained by inlining.

I think what Marc-Andre is alluding to is that the first parameter of
'memchr' is 'void *'
which could (in theory) limit optimization opportunities.  Where as if
it knew that
the data being searched is a 'char *' or something it could take
advantage of that.

----------
nosy: +meador.inge

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13136>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13136] speed-up conversion between unicode widths

Reply via email to