subject:"\[Python\-Dev\] The lazy strings patch"

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

2006-11-03 Thread Josiah Carlson

Larry Hastings <[EMAIL PROTECTED]> wrote:
> But I'm open 
> to suggestions, on this or any other aspect of the patch.

As Martin, I, and others have suggested, direct the patch towards Python
3.x unicode text.  Also, don't be surprised if Guido says no...
http://mail.python.org/pipermail/python-3000/2006-August/003334.html

In that message he talks about why view+string or string+view or
view+view should return strings.  Some are not quite applicable in this
case because with your implementation all additions can return a 'view'.
However, he also states the following with regards to strings vs. views
(an earlier variant of the "lazy strings" you propose),
"Because they can have such different performance and memory usage
 characteristics, it's not right to treat them as the same type."
 - GvR

This suggests (at least to me) that unifying the 'lazy string' with the
2.x string is basically out of the question, which brings me back to my
earlier suggestion; make it into a wrapper that could be used with 3.x
bytes, 3.x text, and perhaps others.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

2006-11-03 Thread Larry Hastings

On 2006/10/20, Larry Hastings wrote:
> I'm ready to post the patch.
Sheesh!  Where does the time go.


I've finally found the time to re-validate and post the patch.  It's 
SF.net patch #1590352:

http://sourceforge.net/tracker/index.php?func=detail&aid=1590352&group_id=5470&atid=305470
I've attached both the patch itself (against the current 2.6 revision, 
52618) and a lengthy treatise on the patch and its ramifications as I 
understand them.

I've also added one more experimental change: a new string method, 
str.simplify().  All it does is force a lazy concatenation / lazy slice 
to render.  (If the string isn't a lazy string, or it's already been 
rendered, str.simplify() is a no-op.)  The idea is, if you know these 
consarned "lazy slices" are giving you the oft-cited horrible memory 
usage scenario, you can tune your app by forcing the slices to render 
and drop their references.  99% of the time you don't care, and you 
enjoy the minor speedup.  The other 1% of the time, you call .simplify() 
and your code behaves as it did under 2.5.  Is this the right approach?  
I dunno.  So far I like it better than the alternatives.  But I'm open 
to suggestions, on this or any other aspect of the patch.

Cheers,


/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-24 Thread Nick Coghlan

Jean-Paul Calderone wrote:
> On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <[EMAIL PROTECTED]> wrote:
>> [snip]
>> If external Python extension modules are as well-behaved as the shipping 
>> Python source tree, there simply wouldn't be a problem.  Python source is 
>> delightfully consistent about using the macro PyString_AS_STRING() to get at 
>> the creamy char *center of a PyStringObject *.  When code religiously uses 
>> that macro (or calls PyString_AsString() directly), all it needs is a 
>> recompile with the current stringobject.h and it will Just Work.
>>
>> I genuinely don't know how many external Python extension modules are well- 
>> behaved in this regard.  But in case it helps: I just checked PIL, NumPy, 
>> PyWin32, and SWIG, and all of them were well-behaved.
> 
> FWIW, http://www.google.com/codesearch?q=+ob_sval

Possible more enlightening (we *know* string objects play with this field!):

http://www.google.com/codesearch?hl=en&lr=&q=ob_sval+-stringobject.%5Bhc%5D&btnG=Search

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-24 Thread Ronald Oussoren



On Oct 24, 2006, at 11:09 AM, Jack Jansen wrote:



Look at packages such as win32, PyObjC, ctypes, bridges between  
Python and other languages, etc. That's where implementors are  
tempted to bend the rules of Official APIs for the benefit of  
serious optimizations.


PyObjC should be safe in this regard, I try to conform to the  
official rules :-)


I do use PyString_AS_STRING outside of the GIL in other extensions  
though, the lazy strings patch would break that. My code is of course  
bending the rules here and can easily be fixed by introducing a  
temporary variable.


Ronald


--
Jack Jansen, <[EMAIL PROTECTED]>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/ 
ronaldoussoren%40mac.com




smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-24 Thread Jack Jansen

On  23-Oct-2006, at 16:58 , Larry Hastings wrote:I genuinely don't know how many external Python extension modules are well-behaved in this regard.  But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved.  Apart from stringobject.c, there was exactly one spot in the Python source tree which made assumptions about the structure of PyStringObjects (Mac/Modules/macos.c).  It's in the block starting with the comment "This is a hack:".  Note that this is unfixed in my patch, so just now all code using that self-avowed "hack" will break.As the author of that hack, that gives me an idea for where you should look for code that will break: code that tries to expose low-level C interfaces to Python. (That hack replaced an even earlier worse hack, that took the id() of a string in Python and added a fixed number to it to get at the address of the string, to fill it into a structure, blush).Look at packages such as win32, PyObjC, ctypes, bridges between Python and other languages, etc. That's where implementors are tempted to bend the rules of Official APIs for the benefit of serious optimizations. --Jack Jansen, <[EMAIL PROTECTED]>, http://www.cwi.nl/~jackIf I can't dance I don't want to be part of your revolution -- Emma Goldman ___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Martin v. Löwis

Larry Hastings schrieb:
> Am I correct in understanding that changing the Python minor revision
> number (2.5 -> 2.6) requires external modules to recompile?  (It
> certainly does on Windows.)

There is an ongoing debate on that. The original intent was that you
normally *shouldn't* have to recompile modules just because the Python
version changes. Instead, you should do so when PYTHON_API_VERSION
changes. Of course, such a change would also cause a change to
PYTHON_API_VERSION.
Then, even if PYTHON_API_VERSION changes, you aren't *required* to
recompile your extension modules. Instead, you get a warning that the
API version is different and *might* require recompilation: it does
require recompilation if the extension module relies on some of the
changed API.
With this change, people not recompiling their extension modules
would likely see Python crash rather quickly after seeing the warning
about incompatible APIs.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Martin v. Löwis

[EMAIL PROTECTED] schrieb:
> >> Anyway, it was my intent to post the patch and see what happened.
> >> Being a first-timer at this, and not having even read the core
> >> development mailing lists for very long, I had no idea what to
> >> expect.  Though I genuinely didn't expect it to be this brusque.
> 
> Martin> I could have told you :-) The "problem" really is that you are
> Martin> suggesting a major, significant change to the implementation of
> Martin> Python, and one that doesn't fix an obvious bug. 
> 
> Come on Martin.  Give Larry a break.

I'm seriously not complaining, I'm explaining.

> Lots of changes have been accepted to
> to the Python core which weren't obvious "bug fixes".

Surely many new features have been implemented over time, but in many
cases, they weren't really "big changes", in the sense that you could
ignore them if you don't like them. This wouldn't be so in this case:
as the string type is very fundamental, people feel a higher interest
in its implementation.

> In fact, I seem to
> recall a sprint held recently in Reykjavik where the whole point was just to
> make Python faster.

That's true. I also recall there were serious complaints about the
outcome of this sprint, and the changes to the struct module in
particular. Still, the struct module is of lesser importance than
the string type, so the concerns were smaller.

> I believe that was exactly Larry's point in posting the
> patch.  The "one obvious way to do" concatenation and slicing for one of the
> most heavily used types in python appears to be faster.  That seems like a
> win to me.

Have you reviewed the patch and can vouch for its correctness, even in
boundary cases? Have you tested it in a real application and found
a real performance improvement? I have done neither, so I can't speak
on the advantages of the patch. I didn't actually object to the
inclusion of the patch, either. I was merely stating what I think
the problems with "that kind of" patch are.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Jean-Paul Calderone

On Mon, 23 Oct 2006 09:07:51 -0700, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
>"Paul Moore" <[EMAIL PROTECTED]> wrote:
>> I had picked up on this comment, and I have to say that I had been a
>> little surprised by the resistance to the change based on the "code
>> would break" argument, when you had made such a thorough attempt to
>> address this. Perhaps others had missed this point, though.
>
>I'm also concerned about future usability.

Me too (perhaps in a different way though).

>Word in the Py3k list is
>that Python 2.6 will be just about the last Python in the 2.x series,
>and by directing his implementation at only Python 2.x strings, he's
>just about guaranteeing obsolescence.

People will be using 2.x for a long time to come.  And in the long run,
isn't all software obsolete? :)

>By building with unicode and/or
>objects with a buffer interface in mind, Larry could build with both 2.x
>and 3.x in mind, and his code wouldn't be obsolete the moment it was
>released.

(I'm not sure what the antecedent of "it" is in the above, I'm going to
assume it's Python 3.x.)

Supporting unicode strings and objects providing the buffer interface seems
like a good idea in general, even disregarding Py3k.  Starting with str is
reasonable though, since there's still plenty of code that will benefit from
this change, if it is indeed a beneficial change.

Larry, I'm going to try to do some benchmarks against Twisted using this
patch, but given my current time constraints, you may be able to beat me
to this :)  If you're interested, Twisted [EMAIL PROTECTED] plus this trial 
plugin:

  http://twistedmatrix.com/trac/browser/sandbox/exarkun/merit/trunk

will let you do some gross measurements using the Twisted test suite.  I can
give some more specific pointers if this sounds like something you'd want to
mess with.

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Josiah Carlson

"Paul Moore" <[EMAIL PROTECTED]> wrote:
> I had picked up on this comment, and I have to say that I had been a
> little surprised by the resistance to the change based on the "code
> would break" argument, when you had made such a thorough attempt to
> address this. Perhaps others had missed this point, though.

I'm also concerned about future usability.  Word in the Py3k list is
that Python 2.6 will be just about the last Python in the 2.x series,
and by directing his implementation at only Python 2.x strings, he's
just about guaranteeing obsolescence.  By building with unicode and/or
objects with a buffer interface in mind, Larry could build with both 2.x
and 3.x in mind, and his code wouldn't be obsolete the moment it was
released.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Fredrik Lundh

Larry Hastings wrote:

> Am I correct in understanding that changing the Python minor revision 
> number (2.5 -> 2.6) requires external modules to recompile?

not, in general, on Unix.  it's recommended, but things usually work 
quite well anyway.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread skip


Larry> The only function that *might* return a non-terminated char * is
Larry> PyString_AsUnterminatedString().  This function is static to
Larry> stringobject.c--and I would be shocked if it were ever otherwise.

If it's static to stringobject.c it doesn't need a PyString_ prefix.  In
fact, I'd argue that it shouldn't have one so that people reading the code
won't miss the "static" and think it is part of the published API.

Larry> Am I correct in understanding that changing the Python minor
Larry> revision number (2.5 -> 2.6) requires external modules to
Larry> recompile?

Yes, in general, though you can often get away without it if you don't mind
Python screaming at you about version mismatches.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Paul Moore

On 10/23/06, Larry Hastings <[EMAIL PROTECTED]> wrote:
>
>  Steve Holden wrote:
>
>  But it seems to me that the only major issue is the inability to provide
> zero-byte terminators with this new representation.
>
>  I guess I wasn't clear in my description of the patch; sorry about that.
>
>  Like "lazy concatenation objects", "lazy slices" render when you call
> PyString_AsString() on them.  Before rendering, the lazy slice's ob_sval
> will be NULL. Afterwards it will point to a proper zero-terminated string,
> at which point the object behaves exactly like any other PyStringObject.

I had picked up on this comment, and I have to say that I had been a
little surprised by the resistance to the change based on the "code
would break" argument, when you had made such a thorough attempt to
address this. Perhaps others had missed this point, though.

> I genuinely don't know how many external Python extension modules are
> well-behaved in this regard.  But in case it helps: I just checked PIL,
> NumPy, PyWin32, and SWIG, and all of them were well-behaved.

There's code out there which was written to the Python 1.4 API, and
has not been updated since (I know, I wrote some of it!) I wouldn't
call it "well-behaved" (it writes directly into the string's character
buffer) but I don't believe it would fail (it only uses
PyString_AsString to get the buffer address).

/* Allocate an Python string object, with uninitialised contents. We
 * must do it this way, so that we can modify the string in place
 * later. See the Python source, Objects/stringobject.c for details.
 */
result = PyString_FromStringAndSize(NULL, len);
if (result == NULL)
return NULL;

p = PyString_AsString(result);

while (*str)
{
if (*str == '\n')
*p = '\0';
else
*p = *str;

++p;
++str;
}

>  Am I correct in understanding that changing the Python minor revision
> number (2.5 -> 2.6) requires external modules to recompile?  (It certainly
> does on Windows.)  If so, I could mitigate the problem by renaming ob_sval.
> That way, code making explicit reference to it would fail to compile, which
> I feel is better than silently recompiling unsafe code.

I think you've covered pretty much all the possible backward
compatibility bases. A sufficiently evil extension could blow up, I
guess, but that's always going to be true.

OTOH, I don't have a comment on the desirability of the patch per se,
as (a) I've never been hit by the speed issue, and (b) I'm thoroughly
indoctrinated, so I always use ''.join() :-)

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Jean-Paul Calderone

On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <[EMAIL PROTECTED]> wrote:
> [snip]
>If external Python extension modules are as well-behaved as the shipping 
>Python source tree, there simply wouldn't be a problem.  Python source is 
>delightfully consistent about using the macro PyString_AS_STRING() to get at 
>the creamy char *center of a PyStringObject *.  When code religiously uses 
>that macro (or calls PyString_AsString() directly), all it needs is a 
>recompile with the current stringobject.h and it will Just Work.
>
>I genuinely don't know how many external Python extension modules are well- 
>behaved in this regard.  But in case it helps: I just checked PIL, NumPy, 
>PyWin32, and SWIG, and all of them were well-behaved.

FWIW, http://www.google.com/codesearch?q=+ob_sval

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Larry Hastings






Steve Holden wrote:

  But it seems to me that the only major issue is the inability to provide 
zero-byte terminators with this new representation.
  

I guess I wasn't clear in my description of the patch; sorry about that.

Like "lazy concatenation objects", "lazy slices" render when you call
PyString_AsString() on them.  Before rendering, the lazy slice's
ob_sval will be NULL. Afterwards it will point to a proper
zero-terminated string, at which point the object behaves exactly like
any other PyStringObject.

The only function that *might* return a non-terminated char * is
PyString_AsUnterminatedString().  This function is static to
stringobject.c--and I would be shocked if it were ever otherwise.


  If there were any reliable way to make sure these objects never got 
passed to extension modules then I'd say "go for it".

If external Python extension modules are as well-behaved as the
shipping Python source tree, there simply wouldn't be a problem. 
Python source is delightfully consistent about using the macro
PyString_AS_STRING() to get at the creamy char *center of a
PyStringObject *.  When code religiously uses that macro (or calls
PyString_AsString() directly), all it needs is a recompile with the
current stringobject.h and it will Just Work.

I genuinely don't know how many external Python extension modules are
well-behaved in this regard.  But in case it helps: I just checked PIL,
NumPy, PyWin32, and SWIG, and all of them were well-behaved.

Apart from stringobject.c, there was exactly one spot in the Python
source tree which made assumptions about the structure of
PyStringObjects (Mac/Modules/macos.c).  It's in the block starting with
the comment "This is a hack:".  Note that this is unfixed in my patch,
so just now all code using that self-avowed "hack" will break.

Am I correct in understanding that changing the Python minor revision
number (2.5 -> 2.6) requires external modules to recompile?  (It
certainly does on Windows.)  If so, I could mitigate the problem by
renaming ob_sval.  That way, code making explicit reference to it would
fail to compile, which I feel is better than silently recompiling
unsafe code.


Cheers,


larry


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Steve Holden

[EMAIL PROTECTED] wrote:
> >> Anyway, it was my intent to post the patch and see what happened.
> >> Being a first-timer at this, and not having even read the core
> >> development mailing lists for very long, I had no idea what to
> >> expect.  Though I genuinely didn't expect it to be this brusque.
> 
> Martin> I could have told you :-) The "problem" really is that you are
> Martin> suggesting a major, significant change to the implementation of
> Martin> Python, and one that doesn't fix an obvious bug. 
> 
The "obvious bug" that it fixes is slowness <0.75 wink>.

> Come on Martin.  Give Larry a break.  Lots of changes have been accepted to
> to the Python core which weren't obvious "bug fixes".  In fact, I seem to
> recall a sprint held recently in Reykjavik where the whole point was just to
> make Python faster.  I believe that was exactly Larry's point in posting the
> patch.  The "one obvious way to do" concatenation and slicing for one of the
> most heavily used types in python appears to be faster.  That seems like a
> win to me.
> 
I did point out to Larry when he went to c.l.py with the original patch 
that he would face resistance, so this hasn't blind-sided him. But it 
seems to me that the only major issue is the inability to provide 
zero-byte terminators with this new representation.

Because Larry's proposal for handling this involves the introduction of 
a new API that can't already be in use in extensions it's obviously the 
extension writers who would be given most problems by this patch.

I can understand resistance on that score, and I could understand 
resistance if there were other clear disadvantages to its 
implementation, but in their absence it seems like the extension modules 
are the killers.

If there were any reliable way to make sure these objects never got 
passed to extension modules then I'd say "go for it". Without that it 
does seem like a potentially widespread change to the C API that could 
affect much code outside the interpreter. This is a great shame. I think 
Larry showed inventiveness and tenacity to get this far, and deserves 
credit for his achievements no matter whether or not they get into the core.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread skip


>> Anyway, it was my intent to post the patch and see what happened.
>> Being a first-timer at this, and not having even read the core
>> development mailing lists for very long, I had no idea what to
>> expect.  Though I genuinely didn't expect it to be this brusque.

Martin> I could have told you :-) The "problem" really is that you are
Martin> suggesting a major, significant change to the implementation of
Martin> Python, and one that doesn't fix an obvious bug. 

Come on Martin.  Give Larry a break.  Lots of changes have been accepted to
to the Python core which weren't obvious "bug fixes".  In fact, I seem to
recall a sprint held recently in Reykjavik where the whole point was just to
make Python faster.  I believe that was exactly Larry's point in posting the
patch.  The "one obvious way to do" concatenation and slicing for one of the
most heavily used types in python appears to be faster.  That seems like a
win to me.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-23 Thread Nick Coghlan

Josiah Carlson wrote:
> Want my advice?  Aim for Py3k text as your primary target, but as a
> wrapper, not as the core type (I put the odds at somewhere around 0 for
> such a core type change).  If you are good, and want to make guys like
> me happy, you could even make it support the buffer interface for
> non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior
> of bytes and text.

This is still my preferred approach, too - for local optimisation of an 
algorithm, a string view type strikes me as an excellent idea. For the core 
data type, though, keeping the behaviour comparatively simple and predictable 
counterbalances the desire for more speed.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-22 Thread Fredrik Lundh

Josiah Carlson wrote:

> It would be a radical change for Python 2.6, and really the 2.x series,
> likely requiring nontrivial changes to extension modules that deal with
> strings, and the assumptions about strings that have held for over a
> decade.

the assumptions hidden in everyone's use of the C-level string API is 
the main concern here, at least for me; radically changing the internal 
format is not a new idea, but it's always been held off because we have 
no idea how people are using the C API.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-22 Thread Josiah Carlson

Larry Hastings <[EMAIL PROTECTED]> wrote:
> It was/is my understanding that the early days of a new major revision 
> was the most judicious time to introduce big changes.  If I had offered 
> these patches six months ago for 2.5, they would have had zero chance of 
> acceptance.  But 2.6 is in its infancy, and so I assumed now was the 
> time to discuss sea-change patches like this.

It would be a radical change for Python 2.6, and really the 2.x series,
likely requiring nontrivial changes to extension modules that deal with
strings, and the assumptions about strings that have held for over a
decade.  I think 2.6 as an option is a non-starter.  Think Py3k, and
really, think bytes and unicode.

> The "stringview" discussion you cite was largely speculation, and as I 
> recall there were users in both camps ("it'll use more memory overall" 
> vs "no it won't").  And, while I saw a test case with microbenchmarks, 
> and a "proof-of-concept" where a stringview was a separate object from a 
> string, I didn't see any real-word applications tested with this approach.
> 
> Rather than start in on speculation about it, I have followed that old 
> maxim of "show me the code".  I've produced actual code that works with 
> real strings in Python.  I see this as an opportunity for Pythonistas to 
> determine the facts for themselves.  Now folks can try the patch with 
> these real-world applications you cite and find out how it really 
> behaves.  (Although I realize the Python community is under no 
> obligation to do so.)

One of the big concerns brought up in the stringview discussion was that
of users expecting one thing and getting another.  Slicing a larger
string producing a 'view', which then keeps the larger string alive,
would be a surprise.  By making it a separate object that just *knows*
about strings (or really, anything that offers a buffer interface), I
was able to make an object that was 1) flexible, 2) usable in any Python,
3) doesn't change the core assumptions about Python, 4) is expandable to
beyond just *strings*.  Reason #4 was my primary reason for writing it,
because str disappears in Py3k, which is closer to happening than most
of us realize.

> If experimentation is the best thing here, I'd be happy to revise the 
> patch to facilitate it.  For instance, I could add command-line 
> arguments letting you tweak the run-time behavior of the patch, like 
> changing the minimum size of a lazy slice.  Perhaps add code so there's 
> a tweakable minimum size of a lazy concatenation too.  Or a tweakable 
> minimum *ratio* necessary for a lazy slice.  I'm open to suggestions.

I believe that would be a waste of time.  The odds of it making it into
Python 2.x without significant core developer support are pretty close
to None, which in Python 2.x is less than 0.  I've been down that road,
nothing good lies that way.

Want my advice?  Aim for Py3k text as your primary target, but as a
wrapper, not as the core type (I put the odds at somewhere around 0 for
such a core type change).  If you are good, and want to make guys like
me happy, you could even make it support the buffer interface for
non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior
of bytes and text.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-22 Thread Martin v. Löwis

Larry Hastings schrieb:
> Anyway, it was my intent to post the patch and see what happened.  Being 
> a first-timer at this, and not having even read the core development 
> mailing lists for very long, I had no idea what to expect.  Though I 
> genuinely didn't expect it to be this brusque.

I could have told you :-) The "problem" really is that you are
suggesting a major, significant change to the implementation of
Python, and one that doesn't fix an obvious bug. The new code
is an order of magnitude more complex than the old one, and the
impact that it will have is unknown - but in the worst case,
it could have serious negative impact, e.g. when the code is
full of errors, and causes Python application to crash in masses.

This is, of course, FUD: it is the fear that this might happen,
the uncertainty about the quality of the code and the doubt
about the viability of the approach.

There are many aspects to such a change, but my experience is
that it primarily takes time. Fredrik Lundh suggested you give
up about Python 2.6, and target Python 3.0 right away; it may
indeed be the case that Python 2.6 is too close for that kind
of change to find enough supporters.

If your primary goal was to contribute to open source, you
might want to look for other areas of Python: there are plenty
of open bugs ("real bugs" :-), unreviewed patches, etc. For
some time, it is more satisfying to work on these, since
the likelihood of success is higher.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-22 Thread Talin

Larry Hastings wrote:
> Martin v. Löwis wrote:

> Let's be specific: when there is at least one long-lived small lazy 
> slice of a large string, and the large string itself would otherwise 
> have been dereferenced and freed, and this small slice is never examined 
> by code outside of stringobject.c, this approach means the large string 
> becomes long-lived too and thus Python consumes more memory overall.  In 
> pathological scenarios this memory usage could be characterized as "insane".
> 
> True dat.  Then again, I could suggest some scenarios where this would 
> save memory (multiple long-lived large slices of a large string), and 
> others where memory use would be a wash (long-lived slices containing 
> the all or almost all of a large string, or any scenario where slices 
> are short-lived).  While I think it's clear lazy slices are *faster* on 
> average, its overall effect on memory use in real-world Python is not 
> yet known.  Read on.

I wonder - how expensive would it be for the string slice to have a weak 
reference, and 'normalize' the slice when the big string is collected? 
Would the overhead of the weak reference swamp the savings?

-- Talin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-22 Thread Larry Hastings

Martin v. Löwis wrote:
> It's not clear to me what you want to achieve with these patches,
> in particular, whether you want to see them integrated into Python or
> not.
>   
I would be thrilled if they were, but it seems less likely with every 
passing day.  If you have some advice on how I might increase the 
patch's chances I would be all ears.

It was/is my understanding that the early days of a new major revision 
was the most judicious time to introduce big changes.  If I had offered 
these patches six months ago for 2.5, they would have had zero chance of 
acceptance.  But 2.6 is in its infancy, and so I assumed now was the 
time to discuss sea-change patches like this.

Anyway, it was my intent to post the patch and see what happened.  Being 
a first-timer at this, and not having even read the core development 
mailing lists for very long, I had no idea what to expect.  Though I 
genuinely didn't expect it to be this brusque.

> I think this specific approach will find strong resistance.
I'd say the "lazy strings" patch is really two approaches, "lazy 
concatenation" and "lazy slices".  You are right, though, *both* have 
"found strong resistance".

> Most recently, it was discussed under the name "string view" on the Py3k 
> list, see
>   http://mail.python.org/pipermail/python-3000/2006-August/003282.html
> Traditionally, the biggest objection is that even small strings may
> consume insane amounts of memory.
>   
Let's be specific: when there is at least one long-lived small lazy 
slice of a large string, and the large string itself would otherwise 
have been dereferenced and freed, and this small slice is never examined 
by code outside of stringobject.c, this approach means the large string 
becomes long-lived too and thus Python consumes more memory overall.  In 
pathological scenarios this memory usage could be characterized as "insane".

True dat.  Then again, I could suggest some scenarios where this would 
save memory (multiple long-lived large slices of a large string), and 
others where memory use would be a wash (long-lived slices containing 
the all or almost all of a large string, or any scenario where slices 
are short-lived).  While I think it's clear lazy slices are *faster* on 
average, its overall effect on memory use in real-world Python is not 
yet known.  Read on.

>> I bet this generally reduces overall memory usage for slices too.
>> 
> Channeling Guido: what real-world applications did you study with
> this patch to make such a claim?
>   
I didn't; I don't have any.  I must admit to being only a small-scale 
Python user.  Memory use remains about the same in pybench, the biggest 
Python app I have handy.  But, then, it was pretty clearly speculation, 
not a claim.  Yes, I *think* it'd use less memory overall.  But I 
wouldn't *claim* anything yet.

The "stringview" discussion you cite was largely speculation, and as I 
recall there were users in both camps ("it'll use more memory overall" 
vs "no it won't").  And, while I saw a test case with microbenchmarks, 
and a "proof-of-concept" where a stringview was a separate object from a 
string, I didn't see any real-word applications tested with this approach.

Rather than start in on speculation about it, I have followed that old 
maxim of "show me the code".  I've produced actual code that works with 
real strings in Python.  I see this as an opportunity for Pythonistas to 
determine the facts for themselves.  Now folks can try the patch with 
these real-world applications you cite and find out how it really 
behaves.  (Although I realize the Python community is under no 
obligation to do so.)

If experimentation is the best thing here, I'd be happy to revise the 
patch to facilitate it.  For instance, I could add command-line 
arguments letting you tweak the run-time behavior of the patch, like 
changing the minimum size of a lazy slice.  Perhaps add code so there's 
a tweakable minimum size of a lazy concatenation too.  Or a tweakable 
minimum *ratio* necessary for a lazy slice.  I'm open to suggestions.

Cheers,

/larry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-21 Thread John J Lee

On Sat, 21 Oct 2006, Mark Roberts wrote:
[...]
> If there's a widely recognized argument against this, a link will likely 
> sate my curiosity.

Quoting from Martin v. Loewis earlier on the same day you posted:

"""
I think this specific approach will find strong resistance. It has been
implemented many times, e.g. (apparently) in NextStep's NSString, and
in Java's string type (where a string holds a reference to a character
array, a start index, and an end index). Most recently, it was discussed
under the name "string view" on the Py3k list, see

http://mail.python.org/pipermail/python-3000/2006-August/003282.html

Traditionally, the biggest objection is that even small strings may
consume insane amounts of memory.
"""

John

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-21 Thread Mark Roberts

Hmm,

I have not viewed the patch in question, but I'm curious why we wouldn't want 
to include such a patch if it were transparent to the user (Python based or 
otherwise).  Especially if it increased performance without sacrificing 
maintainability or elegance.  Further considering the common usage of strings 
in usual programming, I fail to see why an implementation like this would not 
be desirable?

If there's a widely recognized argument against this, a link will likely sate 
my curiosity.

Thanks,
Mark


>  ---Original Message---
>  From: Josiah Carlson <[EMAIL PROTECTED]>
>  Subject: Re: [Python-Dev] The "lazy strings" patch
>  Sent: 21 Oct '06 22:02
>  
>  
>  Larry Hastings <[EMAIL PROTECTED]> wrote:
>  >
>  > I've significantly enhanced my string-concatenation patch, to the point
>  > where that name is no longer accurate.  So I've redubbed it the "lazy
>  > strings" patch.
>  [snip]
>  
>  Honestly, I don't believe that pure strings should be this complicated.
>  The implementation of the standard string and unicode type should be as
>  simple as possible.  The current string and unicode implementations are,
>  in my opinion, as simple as possible given Python's needs.
>  
>  As such, I don't see a need to go mucking about with the standard string
>  implementation to make it "lazy" so as to increase performance, reduce
>  memory consumption, etc.. However, having written a somewhat "lazy"
>  string slicing/etc operation class I called a "string view", whose
>  discussion and implementation can be found in the py3k list, I do
>  believe that having a related type, perhaps with the tree-based
>  implementation you have written, or a simple pointer + length variant
>  like I have written, would be useful to have available to Python.
>  
>  I also believe that it smells like a Py3k feature, which suggests that
>  you should toss the whole string reliance and switch to unicode, as str
>  and unicode become bytes and text in Py3k, with bytes being mutable.
>  
>  
>  - Josiah
>  
>  ___
>  Python-Dev mailing list
>  Python-Dev@python.org
>  http://mail.python.org/mailman/listinfo/python-dev
>  Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/mark%40pandapocket.com
>  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-21 Thread Josiah Carlson

Larry Hastings <[EMAIL PROTECTED]> wrote:
> 
> I've significantly enhanced my string-concatenation patch, to the point 
> where that name is no longer accurate.  So I've redubbed it the "lazy 
> strings" patch.
[snip]

Honestly, I don't believe that pure strings should be this complicated. 
The implementation of the standard string and unicode type should be as
simple as possible.  The current string and unicode implementations are,
in my opinion, as simple as possible given Python's needs.

As such, I don't see a need to go mucking about with the standard string
implementation to make it "lazy" so as to increase performance, reduce
memory consumption, etc.. However, having written a somewhat "lazy"
string slicing/etc operation class I called a "string view", whose
discussion and implementation can be found in the py3k list, I do
believe that having a related type, perhaps with the tree-based
implementation you have written, or a simple pointer + length variant
like I have written, would be useful to have available to Python.

I also believe that it smells like a Py3k feature, which suggests that
you should toss the whole string reliance and switch to unicode, as str
and unicode become bytes and text in Py3k, with bytes being mutable.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-21 Thread Bill Janssen

See also the Cedar Ropes work:

http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

2006-10-21 Thread Martin v. Löwis

Larry Hastings schrieb:
> I've significantly enhanced my string-concatenation patch, to the point
> where that name is no longer accurate.  So I've redubbed it the "lazy
> strings" patch.

It's not clear to me what you want to achieve with these patches,
in particular, whether you want to see them integrated into Python or
not.

> The major new feature is that string *slices* are also represented with
> a lazy-evaluation placeholder for the actual string, just as
> concatenated strings were in my original patch.  The lazy slice object
> stores a reference to the original PyStringObject * it is sliced from,
> and the desired start and stop slice markers.  (It only supports step =
> 1.)

I think this specific approach will find strong resistance. It has been
implemented many times, e.g. (apparently) in NextStep's NSString, and
in Java's string type (where a string holds a reference to a character
array, a start index, and an end index). Most recently, it was discussed
under the name "string view" on the Py3k list, see

http://mail.python.org/pipermail/python-3000/2006-August/003282.html

Traditionally, the biggest objection is that even small strings may
consume insane amounts of memory.

> Its ob_sval is NULL until the string is rendered--but that rarely
> happens!  Not only does this mean string slices are faster, but I bet
> this generally reduces overall memory usage for slices too.

Channeling Guido: what real-world applications did you study with
this patch to make such a claim?

> I'm ready to post the patch.  However, as a result of this work, the
> description on the original patch page is really no longer accurate:
>
> http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470
> Shall I close/delete that patch and submit a new patch with a more
> modern description?  After all, there's not a lot of activity on the old
> patch page...

Closing the issue and opening a new is fine.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-21 Thread Fredrik Lundh

Talin wrote:

> Interesting - is it possible that the same technique could be used to 
> hide differences in character width? Specifically, if I concatenate an 
> ascii string with a UTF-32 string, can the up-conversion to UTF-32 also 
> be done lazily?

of course.

and if all you do with the result is write it to an UTF-8 stream, it 
doesn't need to be done at all.  this requires a slightly more elaborate 
C-level API interface than today's PyString_AS_STRING API, though...

(which is why this whole exercise belongs on the Python 3000 lists, not 
on python-dev for 2.X)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The "lazy strings" patch

2006-10-20 Thread Talin

Interesting - is it possible that the same technique could be used to 
hide differences in character width? Specifically, if I concatenate an 
ascii string with a UTF-32 string, can the up-conversion to UTF-32 also 
be done lazily? If that could be done efficiently, it would resolve some 
outstanding issues that have come up on the Python-3000 list with 
regards to str/unicode convergence.

Larry Hastings wrote:
> 
> I've significantly enhanced my string-concatenation patch, to the point 
> where that name is no longer accurate.  So I've redubbed it the "lazy 
> strings" patch.
> 
> The major new feature is that string *slices* are also represented with 
> a lazy-evaluation placeholder for the actual string, just as 
> concatenated strings were in my original patch.  The lazy slice object 
> stores a reference to the original PyStringObject * it is sliced from, 
> and the desired start and stop slice markers.  (It only supports step = 
> 1.)  Its ob_sval is NULL until the string is rendered--but that rarely 
> happens!  Not only does this mean string slices are faster, but I bet 
> this generally reduces overall memory usage for slices too.
> 
> Now, one rule of the Python programming API is that "all strings are 
> zero-terminated".  That part of makes the life of a Python extension 
> author sane--they don't have to deal with some exotic Python string 
> class, they can just assume C-style strings everywhere.  Ordinarily, 
> this means a string slice couldn't simply point into the original 
> string; if it did, and you executed
>  x = "abcde"
>  y = x[1:4]
> internally y->ob_sval[3] would not be 0, it would be 'e', breaking the 
> API's rule about strings.
> 
> However!  When a PyStringObject lives out its life purely within the 
> Python VM, the only code that strenuously examines its internals is 
> stringobject.c.  And that code almost never needs the trailing zero*.  
> So I've added a new static method in stringobject.c:
>char * PyString_AsUnterminatedString(PyStringObject *)
> If you call it on a lazy-evaluation slice object, it gives you back a 
> pointer into the original string's ob_sval.  The s->ob_size'th element 
> of this *might not* be zero, but if you call this function you're saying 
> that's a-okay, you promise not to look at it.  (If the PyStringObject * 
> is any other variety, it calls into PyString_AsString, which renders 
> string concatenation objects then returns ob_sval.)
> 
> Again: this behavior is *never* observed by anyone outside of 
> stringobject.c.  External users of PyStringObjects call 
> PyString_AS_STRING(), which renders all lazy concatenation and lazy 
> slices so they look just like normal zero-terminated PyStringObjects.  
> With my patch applied, trunk still passes all expected tests.
> 
> Of course, lazy slice objects aren't just for literal slices created 
> with [x:y].  There are lots of string methods that return what are 
> effectively string slices, like lstrip() and split().
> 
> With this code in place, string slices that aren't examined by modules 
> are very rarely rendered.  I ran "pybench -n 2" (two rounds, warp 10 
> (whatever that means)) while collecting some statistics.  When it 
> finished, the interpreter had created a total of 640,041 lazy slices, of 
> which only *19* were ever rendered.
> 
> 
> Apart from lazy slices, there's only one more enhancement when compared 
> with v1: string prepending now reuses lazy concatenation objects much 
> more often. There was an optimization in string_concatenate 
> (Python/ceval.c) that said: "if the left-side string has two references, 
> and we're about to overwrite the second reference by storing this 
> concatenation to an object, tell that object to drop its reference".  
> That often meant the reference on the string dropped to 1, which meant 
> PyString_Resize could just resize the left-side string in place and 
> append the right-side.  I modified it so it drops the reference to the 
> right-hand operand too.  With this change, even with a reduction in the 
> allowable stack depth for right-hand recursion (so it's less likely to 
> blow the stack), I was able to prepend over 86k strings before it forced 
> a render.  (Oh, for the record: I ensure depth limits are enforced when 
> combining lazy slices and lazy concatenations, so you still won't blow 
> your stack when you mix them together.)
> 
> 
> Here are the highlights of a single apples-to-apples pybench run, 2.6 
> trunk revision 52413 ("this") versus that same revision with my patch 
> applied ("other"):
> 
> Test minimum run-timeaverage  run-time
> thisother   diffthisother   
> diff
> ---
>  
> 
> ConcatStrings:   204ms76ms +168.4%   213ms77ms 
> +177.7%
>   CreateStringsWithConcat:   159ms   138ms  +15.7%   163ms   142ms  
> +15.1%
> Stri

[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

2006-10-20 Thread Larry Hastings






I've significantly enhanced my string-concatenation patch, to the point
where that name is no longer accurate.  So I've redubbed it the "lazy
strings" patch.

The major new feature is that string *slices* are also represented with
a lazy-evaluation placeholder for the actual string, just as
concatenated strings were in my original patch.  The lazy slice object
stores a reference to the original PyStringObject * it is sliced from,
and the desired start and stop slice markers.  (It only supports step =
1.)  Its ob_sval is NULL until the string is rendered--but that rarely
happens!  Not only does this mean string slices are faster, but I bet
this generally reduces overall memory usage for slices too.

Now, one rule of the Python programming API is that "all strings are
zero-terminated".  That part of makes the life of a Python extension
author sane--they don't have to deal with some exotic Python string
class, they can just assume C-style strings everywhere.  Ordinarily,
this means a string slice couldn't simply point into the original
string; if it did, and you executed
  x = "abcde"
  y = x[1:4]
internally y->ob_sval[3] would not be 0, it would be 'e', breaking
the API's rule about strings.

However!  When a PyStringObject lives out its life purely within the
Python VM, the only code that strenuously examines its internals is
stringobject.c.  And that code almost never needs the trailing zero*. 
So I've added a new static method in stringobject.c:
    char * PyString_AsUnterminatedString(PyStringObject *)
If you call it on a lazy-evaluation slice object, it gives you back a
pointer into the original string's ob_sval.  The s->ob_size'th
element of this *might not* be zero, but if you call this function
you're saying that's a-okay, you promise not to look at it.  (If the
PyStringObject * is any other variety, it calls into PyString_AsString,
which renders string concatenation objects then returns ob_sval.)

Again: this behavior is *never* observed by anyone outside of
stringobject.c.  External users of PyStringObjects call
PyString_AS_STRING(), which renders all lazy concatenation and lazy
slices so they look just like normal zero-terminated PyStringObjects. 
With my patch applied, trunk still passes all expected tests.

Of course, lazy slice objects aren't just for literal slices created
with [x:y]. 
There are lots of string methods that return what are effectively
string slices, like lstrip() and split().

With this code in place, string slices that aren't examined by modules
are very rarely rendered.  I ran "pybench -n 2" (two rounds, warp 10
(whatever that means)) while collecting some statistics.  When it
finished, the interpreter had created a total of 640,041 lazy slices,
of which only *19* were ever rendered.


Apart from lazy slices, there's only one more enhancement when compared
with v1: string prepending now reuses lazy concatenation objects much
more often. There was an optimization in string_concatenate
(Python/ceval.c) that said: "if the left-side string has two
references, and we're about to overwrite the second reference by
storing this concatenation to an object, tell that object to drop its
reference".  That often meant the reference on the string dropped to 1,
which meant PyString_Resize could just resize the left-side string in
place and append the right-side.  I modified it so it drops the
reference to the right-hand operand too.  With this change, even with a
reduction in the allowable stack depth for right-hand recursion (so
it's less likely to blow the stack), I was able to prepend over 86k
strings before it forced a render.  (Oh, for the record: I ensure depth
limits are enforced when combining lazy slices and lazy concatenations,
so you still won't blow your stack when you mix them together.)


Here are the highlights of a single apples-to-apples pybench run, 2.6
trunk revision 52413 ("this") versus that same revision with my patch
applied ("other"):

Test minimum run-time    average 
run-time
 this    other   diff    this   
other   diff
---
 ConcatStrings:   204ms    76ms +168.4%   213ms    77ms
+177.7%
   CreateStringsWithConcat:   159ms   138ms  +15.7%   163ms  
142ms  +15.1%
 StringSlicing:   142ms    86ms  +65.5%   145ms   
88ms  +64.6%
---
Totals:  7976ms  7713ms   +3.4%  8257ms 
7975ms   +3.5%

I also ran this totally unfair benchmark:
    x = "abcde" * (2) # 100k characters
    for i in xrange(1000):
    y = x[1:-1]
and found my patched version to be 9759% faster.  (You heard that
right, 98x faster.)


I'm ready to post the patch.  However, as a result of this work, the
description on the original patch page is really no longer accurate:
   
http://sourceforge.net/tracker/index.php?func

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

Re: [Python-Dev] The "lazy strings" patch

Re: [Python-Dev] The "lazy strings" patch

[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]

30 matches

Site Navigation

Mail list logo

Footer information