Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]
Larry Hastings <[EMAIL PROTECTED]> wrote: > But I'm open > to suggestions, on this or any other aspect of the patch. As Martin, I, and others have suggested, direct the patch towards Python 3.x unicode text. Also, don't be surprised if Guido says no... http://mail.python.org/pipermail/python-3000/2006-August/003334.html In that message he talks about why view+string or string+view or view+view should return strings. Some are not quite applicable in this case because with your implementation all additions can return a 'view'. However, he also states the following with regards to strings vs. views (an earlier variant of the "lazy strings" you propose), "Because they can have such different performance and memory usage characteristics, it's not right to treat them as the same type." - GvR This suggests (at least to me) that unifying the 'lazy string' with the 2.x string is basically out of the question, which brings me back to my earlier suggestion; make it into a wrapper that could be used with 3.x bytes, 3.x text, and perhaps others. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]
On 2006/10/20, Larry Hastings wrote: > I'm ready to post the patch. Sheesh! Where does the time go. I've finally found the time to re-validate and post the patch. It's SF.net patch #1590352: http://sourceforge.net/tracker/index.php?func=detail&aid=1590352&group_id=5470&atid=305470 I've attached both the patch itself (against the current 2.6 revision, 52618) and a lengthy treatise on the patch and its ramifications as I understand them. I've also added one more experimental change: a new string method, str.simplify(). All it does is force a lazy concatenation / lazy slice to render. (If the string isn't a lazy string, or it's already been rendered, str.simplify() is a no-op.) The idea is, if you know these consarned "lazy slices" are giving you the oft-cited horrible memory usage scenario, you can tune your app by forcing the slices to render and drop their references. 99% of the time you don't care, and you enjoy the minor speedup. The other 1% of the time, you call .simplify() and your code behaves as it did under 2.5. Is this the right approach? I dunno. So far I like it better than the alternatives. But I'm open to suggestions, on this or any other aspect of the patch. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Jean-Paul Calderone wrote: > On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <[EMAIL PROTECTED]> wrote: >> [snip] >> If external Python extension modules are as well-behaved as the shipping >> Python source tree, there simply wouldn't be a problem. Python source is >> delightfully consistent about using the macro PyString_AS_STRING() to get at >> the creamy char *center of a PyStringObject *. When code religiously uses >> that macro (or calls PyString_AsString() directly), all it needs is a >> recompile with the current stringobject.h and it will Just Work. >> >> I genuinely don't know how many external Python extension modules are well- >> behaved in this regard. But in case it helps: I just checked PIL, NumPy, >> PyWin32, and SWIG, and all of them were well-behaved. > > FWIW, http://www.google.com/codesearch?q=+ob_sval Possible more enlightening (we *know* string objects play with this field!): http://www.google.com/codesearch?hl=en&lr=&q=ob_sval+-stringobject.%5Bhc%5D&btnG=Search Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On Oct 24, 2006, at 11:09 AM, Jack Jansen wrote: Look at packages such as win32, PyObjC, ctypes, bridges between Python and other languages, etc. That's where implementors are tempted to bend the rules of Official APIs for the benefit of serious optimizations. PyObjC should be safe in this regard, I try to conform to the official rules :-) I do use PyString_AS_STRING outside of the GIL in other extensions though, the lazy strings patch would break that. My code is of course bending the rules here and can easily be fixed by introducing a temporary variable. Ronald -- Jack Jansen, <[EMAIL PROTECTED]>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ ronaldoussoren%40mac.com smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On 23-Oct-2006, at 16:58 , Larry Hastings wrote:I genuinely don't know how many external Python extension modules are well-behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved. Apart from stringobject.c, there was exactly one spot in the Python source tree which made assumptions about the structure of PyStringObjects (Mac/Modules/macos.c). It's in the block starting with the comment "This is a hack:". Note that this is unfixed in my patch, so just now all code using that self-avowed "hack" will break.As the author of that hack, that gives me an idea for where you should look for code that will break: code that tries to expose low-level C interfaces to Python. (That hack replaced an even earlier worse hack, that took the id() of a string in Python and added a fixed number to it to get at the address of the string, to fill it into a structure, blush).Look at packages such as win32, PyObjC, ctypes, bridges between Python and other languages, etc. That's where implementors are tempted to bend the rules of Official APIs for the benefit of serious optimizations. --Jack Jansen, <[EMAIL PROTECTED]>, http://www.cwi.nl/~jackIf I can't dance I don't want to be part of your revolution -- Emma Goldman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings schrieb: > Am I correct in understanding that changing the Python minor revision > number (2.5 -> 2.6) requires external modules to recompile? (It > certainly does on Windows.) There is an ongoing debate on that. The original intent was that you normally *shouldn't* have to recompile modules just because the Python version changes. Instead, you should do so when PYTHON_API_VERSION changes. Of course, such a change would also cause a change to PYTHON_API_VERSION. Then, even if PYTHON_API_VERSION changes, you aren't *required* to recompile your extension modules. Instead, you get a warning that the API version is different and *might* require recompilation: it does require recompilation if the extension module relies on some of the changed API. With this change, people not recompiling their extension modules would likely see Python crash rather quickly after seeing the warning about incompatible APIs. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
[EMAIL PROTECTED] schrieb: > >> Anyway, it was my intent to post the patch and see what happened. > >> Being a first-timer at this, and not having even read the core > >> development mailing lists for very long, I had no idea what to > >> expect. Though I genuinely didn't expect it to be this brusque. > > Martin> I could have told you :-) The "problem" really is that you are > Martin> suggesting a major, significant change to the implementation of > Martin> Python, and one that doesn't fix an obvious bug. > > Come on Martin. Give Larry a break. I'm seriously not complaining, I'm explaining. > Lots of changes have been accepted to > to the Python core which weren't obvious "bug fixes". Surely many new features have been implemented over time, but in many cases, they weren't really "big changes", in the sense that you could ignore them if you don't like them. This wouldn't be so in this case: as the string type is very fundamental, people feel a higher interest in its implementation. > In fact, I seem to > recall a sprint held recently in Reykjavik where the whole point was just to > make Python faster. That's true. I also recall there were serious complaints about the outcome of this sprint, and the changes to the struct module in particular. Still, the struct module is of lesser importance than the string type, so the concerns were smaller. > I believe that was exactly Larry's point in posting the > patch. The "one obvious way to do" concatenation and slicing for one of the > most heavily used types in python appears to be faster. That seems like a > win to me. Have you reviewed the patch and can vouch for its correctness, even in boundary cases? Have you tested it in a real application and found a real performance improvement? I have done neither, so I can't speak on the advantages of the patch. I didn't actually object to the inclusion of the patch, either. I was merely stating what I think the problems with "that kind of" patch are. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On Mon, 23 Oct 2006 09:07:51 -0700, Josiah Carlson <[EMAIL PROTECTED]> wrote: > >"Paul Moore" <[EMAIL PROTECTED]> wrote: >> I had picked up on this comment, and I have to say that I had been a >> little surprised by the resistance to the change based on the "code >> would break" argument, when you had made such a thorough attempt to >> address this. Perhaps others had missed this point, though. > >I'm also concerned about future usability. Me too (perhaps in a different way though). >Word in the Py3k list is >that Python 2.6 will be just about the last Python in the 2.x series, >and by directing his implementation at only Python 2.x strings, he's >just about guaranteeing obsolescence. People will be using 2.x for a long time to come. And in the long run, isn't all software obsolete? :) >By building with unicode and/or >objects with a buffer interface in mind, Larry could build with both 2.x >and 3.x in mind, and his code wouldn't be obsolete the moment it was >released. (I'm not sure what the antecedent of "it" is in the above, I'm going to assume it's Python 3.x.) Supporting unicode strings and objects providing the buffer interface seems like a good idea in general, even disregarding Py3k. Starting with str is reasonable though, since there's still plenty of code that will benefit from this change, if it is indeed a beneficial change. Larry, I'm going to try to do some benchmarks against Twisted using this patch, but given my current time constraints, you may be able to beat me to this :) If you're interested, Twisted [EMAIL PROTECTED] plus this trial plugin: http://twistedmatrix.com/trac/browser/sandbox/exarkun/merit/trunk will let you do some gross measurements using the Twisted test suite. I can give some more specific pointers if this sounds like something you'd want to mess with. Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
"Paul Moore" <[EMAIL PROTECTED]> wrote: > I had picked up on this comment, and I have to say that I had been a > little surprised by the resistance to the change based on the "code > would break" argument, when you had made such a thorough attempt to > address this. Perhaps others had missed this point, though. I'm also concerned about future usability. Word in the Py3k list is that Python 2.6 will be just about the last Python in the 2.x series, and by directing his implementation at only Python 2.x strings, he's just about guaranteeing obsolescence. By building with unicode and/or objects with a buffer interface in mind, Larry could build with both 2.x and 3.x in mind, and his code wouldn't be obsolete the moment it was released. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings wrote: > Am I correct in understanding that changing the Python minor revision > number (2.5 -> 2.6) requires external modules to recompile? not, in general, on Unix. it's recommended, but things usually work quite well anyway. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry> The only function that *might* return a non-terminated char * is Larry> PyString_AsUnterminatedString(). This function is static to Larry> stringobject.c--and I would be shocked if it were ever otherwise. If it's static to stringobject.c it doesn't need a PyString_ prefix. In fact, I'd argue that it shouldn't have one so that people reading the code won't miss the "static" and think it is part of the published API. Larry> Am I correct in understanding that changing the Python minor Larry> revision number (2.5 -> 2.6) requires external modules to Larry> recompile? Yes, in general, though you can often get away without it if you don't mind Python screaming at you about version mismatches. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On 10/23/06, Larry Hastings <[EMAIL PROTECTED]> wrote: > > Steve Holden wrote: > > But it seems to me that the only major issue is the inability to provide > zero-byte terminators with this new representation. > > I guess I wasn't clear in my description of the patch; sorry about that. > > Like "lazy concatenation objects", "lazy slices" render when you call > PyString_AsString() on them. Before rendering, the lazy slice's ob_sval > will be NULL. Afterwards it will point to a proper zero-terminated string, > at which point the object behaves exactly like any other PyStringObject. I had picked up on this comment, and I have to say that I had been a little surprised by the resistance to the change based on the "code would break" argument, when you had made such a thorough attempt to address this. Perhaps others had missed this point, though. > I genuinely don't know how many external Python extension modules are > well-behaved in this regard. But in case it helps: I just checked PIL, > NumPy, PyWin32, and SWIG, and all of them were well-behaved. There's code out there which was written to the Python 1.4 API, and has not been updated since (I know, I wrote some of it!) I wouldn't call it "well-behaved" (it writes directly into the string's character buffer) but I don't believe it would fail (it only uses PyString_AsString to get the buffer address). /* Allocate an Python string object, with uninitialised contents. We * must do it this way, so that we can modify the string in place * later. See the Python source, Objects/stringobject.c for details. */ result = PyString_FromStringAndSize(NULL, len); if (result == NULL) return NULL; p = PyString_AsString(result); while (*str) { if (*str == '\n') *p = '\0'; else *p = *str; ++p; ++str; } > Am I correct in understanding that changing the Python minor revision > number (2.5 -> 2.6) requires external modules to recompile? (It certainly > does on Windows.) If so, I could mitigate the problem by renaming ob_sval. > That way, code making explicit reference to it would fail to compile, which > I feel is better than silently recompiling unsafe code. I think you've covered pretty much all the possible backward compatibility bases. A sufficiently evil extension could blow up, I guess, but that's always going to be true. OTOH, I don't have a comment on the desirability of the patch per se, as (a) I've never been hit by the speed issue, and (b) I'm thoroughly indoctrinated, so I always use ''.join() :-) Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On Mon, 23 Oct 2006 07:58:25 -0700, Larry Hastings <[EMAIL PROTECTED]> wrote: > [snip] >If external Python extension modules are as well-behaved as the shipping >Python source tree, there simply wouldn't be a problem. Python source is >delightfully consistent about using the macro PyString_AS_STRING() to get at >the creamy char *center of a PyStringObject *. When code religiously uses >that macro (or calls PyString_AsString() directly), all it needs is a >recompile with the current stringobject.h and it will Just Work. > >I genuinely don't know how many external Python extension modules are well- >behaved in this regard. But in case it helps: I just checked PIL, NumPy, >PyWin32, and SWIG, and all of them were well-behaved. FWIW, http://www.google.com/codesearch?q=+ob_sval Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Steve Holden wrote: But it seems to me that the only major issue is the inability to provide zero-byte terminators with this new representation. I guess I wasn't clear in my description of the patch; sorry about that. Like "lazy concatenation objects", "lazy slices" render when you call PyString_AsString() on them. Before rendering, the lazy slice's ob_sval will be NULL. Afterwards it will point to a proper zero-terminated string, at which point the object behaves exactly like any other PyStringObject. The only function that *might* return a non-terminated char * is PyString_AsUnterminatedString(). This function is static to stringobject.c--and I would be shocked if it were ever otherwise. If there were any reliable way to make sure these objects never got passed to extension modules then I'd say "go for it". If external Python extension modules are as well-behaved as the shipping Python source tree, there simply wouldn't be a problem. Python source is delightfully consistent about using the macro PyString_AS_STRING() to get at the creamy char *center of a PyStringObject *. When code religiously uses that macro (or calls PyString_AsString() directly), all it needs is a recompile with the current stringobject.h and it will Just Work. I genuinely don't know how many external Python extension modules are well-behaved in this regard. But in case it helps: I just checked PIL, NumPy, PyWin32, and SWIG, and all of them were well-behaved. Apart from stringobject.c, there was exactly one spot in the Python source tree which made assumptions about the structure of PyStringObjects (Mac/Modules/macos.c). It's in the block starting with the comment "This is a hack:". Note that this is unfixed in my patch, so just now all code using that self-avowed "hack" will break. Am I correct in understanding that changing the Python minor revision number (2.5 -> 2.6) requires external modules to recompile? (It certainly does on Windows.) If so, I could mitigate the problem by renaming ob_sval. That way, code making explicit reference to it would fail to compile, which I feel is better than silently recompiling unsafe code. Cheers, larry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
[EMAIL PROTECTED] wrote: > >> Anyway, it was my intent to post the patch and see what happened. > >> Being a first-timer at this, and not having even read the core > >> development mailing lists for very long, I had no idea what to > >> expect. Though I genuinely didn't expect it to be this brusque. > > Martin> I could have told you :-) The "problem" really is that you are > Martin> suggesting a major, significant change to the implementation of > Martin> Python, and one that doesn't fix an obvious bug. > The "obvious bug" that it fixes is slowness <0.75 wink>. > Come on Martin. Give Larry a break. Lots of changes have been accepted to > to the Python core which weren't obvious "bug fixes". In fact, I seem to > recall a sprint held recently in Reykjavik where the whole point was just to > make Python faster. I believe that was exactly Larry's point in posting the > patch. The "one obvious way to do" concatenation and slicing for one of the > most heavily used types in python appears to be faster. That seems like a > win to me. > I did point out to Larry when he went to c.l.py with the original patch that he would face resistance, so this hasn't blind-sided him. But it seems to me that the only major issue is the inability to provide zero-byte terminators with this new representation. Because Larry's proposal for handling this involves the introduction of a new API that can't already be in use in extensions it's obviously the extension writers who would be given most problems by this patch. I can understand resistance on that score, and I could understand resistance if there were other clear disadvantages to its implementation, but in their absence it seems like the extension modules are the killers. If there were any reliable way to make sure these objects never got passed to extension modules then I'd say "go for it". Without that it does seem like a potentially widespread change to the C API that could affect much code outside the interpreter. This is a great shame. I think Larry showed inventiveness and tenacity to get this far, and deserves credit for his achievements no matter whether or not they get into the core. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
>> Anyway, it was my intent to post the patch and see what happened. >> Being a first-timer at this, and not having even read the core >> development mailing lists for very long, I had no idea what to >> expect. Though I genuinely didn't expect it to be this brusque. Martin> I could have told you :-) The "problem" really is that you are Martin> suggesting a major, significant change to the implementation of Martin> Python, and one that doesn't fix an obvious bug. Come on Martin. Give Larry a break. Lots of changes have been accepted to to the Python core which weren't obvious "bug fixes". In fact, I seem to recall a sprint held recently in Reykjavik where the whole point was just to make Python faster. I believe that was exactly Larry's point in posting the patch. The "one obvious way to do" concatenation and slicing for one of the most heavily used types in python appears to be faster. That seems like a win to me. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Josiah Carlson wrote: > Want my advice? Aim for Py3k text as your primary target, but as a > wrapper, not as the core type (I put the odds at somewhere around 0 for > such a core type change). If you are good, and want to make guys like > me happy, you could even make it support the buffer interface for > non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior > of bytes and text. This is still my preferred approach, too - for local optimisation of an algorithm, a string view type strikes me as an excellent idea. For the core data type, though, keeping the behaviour comparatively simple and predictable counterbalances the desire for more speed. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Josiah Carlson wrote: > It would be a radical change for Python 2.6, and really the 2.x series, > likely requiring nontrivial changes to extension modules that deal with > strings, and the assumptions about strings that have held for over a > decade. the assumptions hidden in everyone's use of the C-level string API is the main concern here, at least for me; radically changing the internal format is not a new idea, but it's always been held off because we have no idea how people are using the C API. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings <[EMAIL PROTECTED]> wrote: > It was/is my understanding that the early days of a new major revision > was the most judicious time to introduce big changes. If I had offered > these patches six months ago for 2.5, they would have had zero chance of > acceptance. But 2.6 is in its infancy, and so I assumed now was the > time to discuss sea-change patches like this. It would be a radical change for Python 2.6, and really the 2.x series, likely requiring nontrivial changes to extension modules that deal with strings, and the assumptions about strings that have held for over a decade. I think 2.6 as an option is a non-starter. Think Py3k, and really, think bytes and unicode. > The "stringview" discussion you cite was largely speculation, and as I > recall there were users in both camps ("it'll use more memory overall" > vs "no it won't"). And, while I saw a test case with microbenchmarks, > and a "proof-of-concept" where a stringview was a separate object from a > string, I didn't see any real-word applications tested with this approach. > > Rather than start in on speculation about it, I have followed that old > maxim of "show me the code". I've produced actual code that works with > real strings in Python. I see this as an opportunity for Pythonistas to > determine the facts for themselves. Now folks can try the patch with > these real-world applications you cite and find out how it really > behaves. (Although I realize the Python community is under no > obligation to do so.) One of the big concerns brought up in the stringview discussion was that of users expecting one thing and getting another. Slicing a larger string producing a 'view', which then keeps the larger string alive, would be a surprise. By making it a separate object that just *knows* about strings (or really, anything that offers a buffer interface), I was able to make an object that was 1) flexible, 2) usable in any Python, 3) doesn't change the core assumptions about Python, 4) is expandable to beyond just *strings*. Reason #4 was my primary reason for writing it, because str disappears in Py3k, which is closer to happening than most of us realize. > If experimentation is the best thing here, I'd be happy to revise the > patch to facilitate it. For instance, I could add command-line > arguments letting you tweak the run-time behavior of the patch, like > changing the minimum size of a lazy slice. Perhaps add code so there's > a tweakable minimum size of a lazy concatenation too. Or a tweakable > minimum *ratio* necessary for a lazy slice. I'm open to suggestions. I believe that would be a waste of time. The odds of it making it into Python 2.x without significant core developer support are pretty close to None, which in Python 2.x is less than 0. I've been down that road, nothing good lies that way. Want my advice? Aim for Py3k text as your primary target, but as a wrapper, not as the core type (I put the odds at somewhere around 0 for such a core type change). If you are good, and want to make guys like me happy, you could even make it support the buffer interface for non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior of bytes and text. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings schrieb: > Anyway, it was my intent to post the patch and see what happened. Being > a first-timer at this, and not having even read the core development > mailing lists for very long, I had no idea what to expect. Though I > genuinely didn't expect it to be this brusque. I could have told you :-) The "problem" really is that you are suggesting a major, significant change to the implementation of Python, and one that doesn't fix an obvious bug. The new code is an order of magnitude more complex than the old one, and the impact that it will have is unknown - but in the worst case, it could have serious negative impact, e.g. when the code is full of errors, and causes Python application to crash in masses. This is, of course, FUD: it is the fear that this might happen, the uncertainty about the quality of the code and the doubt about the viability of the approach. There are many aspects to such a change, but my experience is that it primarily takes time. Fredrik Lundh suggested you give up about Python 2.6, and target Python 3.0 right away; it may indeed be the case that Python 2.6 is too close for that kind of change to find enough supporters. If your primary goal was to contribute to open source, you might want to look for other areas of Python: there are plenty of open bugs ("real bugs" :-), unreviewed patches, etc. For some time, it is more satisfying to work on these, since the likelihood of success is higher. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings wrote: > Martin v. Löwis wrote: > Let's be specific: when there is at least one long-lived small lazy > slice of a large string, and the large string itself would otherwise > have been dereferenced and freed, and this small slice is never examined > by code outside of stringobject.c, this approach means the large string > becomes long-lived too and thus Python consumes more memory overall. In > pathological scenarios this memory usage could be characterized as "insane". > > True dat. Then again, I could suggest some scenarios where this would > save memory (multiple long-lived large slices of a large string), and > others where memory use would be a wash (long-lived slices containing > the all or almost all of a large string, or any scenario where slices > are short-lived). While I think it's clear lazy slices are *faster* on > average, its overall effect on memory use in real-world Python is not > yet known. Read on. I wonder - how expensive would it be for the string slice to have a weak reference, and 'normalize' the slice when the big string is collected? Would the overhead of the weak reference swamp the savings? -- Talin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Martin v. Löwis wrote: > It's not clear to me what you want to achieve with these patches, > in particular, whether you want to see them integrated into Python or > not. > I would be thrilled if they were, but it seems less likely with every passing day. If you have some advice on how I might increase the patch's chances I would be all ears. It was/is my understanding that the early days of a new major revision was the most judicious time to introduce big changes. If I had offered these patches six months ago for 2.5, they would have had zero chance of acceptance. But 2.6 is in its infancy, and so I assumed now was the time to discuss sea-change patches like this. Anyway, it was my intent to post the patch and see what happened. Being a first-timer at this, and not having even read the core development mailing lists for very long, I had no idea what to expect. Though I genuinely didn't expect it to be this brusque. > I think this specific approach will find strong resistance. I'd say the "lazy strings" patch is really two approaches, "lazy concatenation" and "lazy slices". You are right, though, *both* have "found strong resistance". > Most recently, it was discussed under the name "string view" on the Py3k > list, see > http://mail.python.org/pipermail/python-3000/2006-August/003282.html > Traditionally, the biggest objection is that even small strings may > consume insane amounts of memory. > Let's be specific: when there is at least one long-lived small lazy slice of a large string, and the large string itself would otherwise have been dereferenced and freed, and this small slice is never examined by code outside of stringobject.c, this approach means the large string becomes long-lived too and thus Python consumes more memory overall. In pathological scenarios this memory usage could be characterized as "insane". True dat. Then again, I could suggest some scenarios where this would save memory (multiple long-lived large slices of a large string), and others where memory use would be a wash (long-lived slices containing the all or almost all of a large string, or any scenario where slices are short-lived). While I think it's clear lazy slices are *faster* on average, its overall effect on memory use in real-world Python is not yet known. Read on. >> I bet this generally reduces overall memory usage for slices too. >> > Channeling Guido: what real-world applications did you study with > this patch to make such a claim? > I didn't; I don't have any. I must admit to being only a small-scale Python user. Memory use remains about the same in pybench, the biggest Python app I have handy. But, then, it was pretty clearly speculation, not a claim. Yes, I *think* it'd use less memory overall. But I wouldn't *claim* anything yet. The "stringview" discussion you cite was largely speculation, and as I recall there were users in both camps ("it'll use more memory overall" vs "no it won't"). And, while I saw a test case with microbenchmarks, and a "proof-of-concept" where a stringview was a separate object from a string, I didn't see any real-word applications tested with this approach. Rather than start in on speculation about it, I have followed that old maxim of "show me the code". I've produced actual code that works with real strings in Python. I see this as an opportunity for Pythonistas to determine the facts for themselves. Now folks can try the patch with these real-world applications you cite and find out how it really behaves. (Although I realize the Python community is under no obligation to do so.) If experimentation is the best thing here, I'd be happy to revise the patch to facilitate it. For instance, I could add command-line arguments letting you tweak the run-time behavior of the patch, like changing the minimum size of a lazy slice. Perhaps add code so there's a tweakable minimum size of a lazy concatenation too. Or a tweakable minimum *ratio* necessary for a lazy slice. I'm open to suggestions. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
On Sat, 21 Oct 2006, Mark Roberts wrote: [...] > If there's a widely recognized argument against this, a link will likely > sate my curiosity. Quoting from Martin v. Loewis earlier on the same day you posted: """ I think this specific approach will find strong resistance. It has been implemented many times, e.g. (apparently) in NextStep's NSString, and in Java's string type (where a string holds a reference to a character array, a start index, and an end index). Most recently, it was discussed under the name "string view" on the Py3k list, see http://mail.python.org/pipermail/python-3000/2006-August/003282.html Traditionally, the biggest objection is that even small strings may consume insane amounts of memory. """ John ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Hmm, I have not viewed the patch in question, but I'm curious why we wouldn't want to include such a patch if it were transparent to the user (Python based or otherwise). Especially if it increased performance without sacrificing maintainability or elegance. Further considering the common usage of strings in usual programming, I fail to see why an implementation like this would not be desirable? If there's a widely recognized argument against this, a link will likely sate my curiosity. Thanks, Mark > ---Original Message--- > From: Josiah Carlson <[EMAIL PROTECTED]> > Subject: Re: [Python-Dev] The "lazy strings" patch > Sent: 21 Oct '06 22:02 > > > Larry Hastings <[EMAIL PROTECTED]> wrote: > > > > I've significantly enhanced my string-concatenation patch, to the point > > where that name is no longer accurate. So I've redubbed it the "lazy > > strings" patch. > [snip] > > Honestly, I don't believe that pure strings should be this complicated. > The implementation of the standard string and unicode type should be as > simple as possible. The current string and unicode implementations are, > in my opinion, as simple as possible given Python's needs. > > As such, I don't see a need to go mucking about with the standard string > implementation to make it "lazy" so as to increase performance, reduce > memory consumption, etc.. However, having written a somewhat "lazy" > string slicing/etc operation class I called a "string view", whose > discussion and implementation can be found in the py3k list, I do > believe that having a related type, perhaps with the tree-based > implementation you have written, or a simple pointer + length variant > like I have written, would be useful to have available to Python. > > I also believe that it smells like a Py3k feature, which suggests that > you should toss the whole string reliance and switch to unicode, as str > and unicode become bytes and text in Py3k, with bytes being mutable. > > > - Josiah > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/mark%40pandapocket.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Larry Hastings <[EMAIL PROTECTED]> wrote: > > I've significantly enhanced my string-concatenation patch, to the point > where that name is no longer accurate. So I've redubbed it the "lazy > strings" patch. [snip] Honestly, I don't believe that pure strings should be this complicated. The implementation of the standard string and unicode type should be as simple as possible. The current string and unicode implementations are, in my opinion, as simple as possible given Python's needs. As such, I don't see a need to go mucking about with the standard string implementation to make it "lazy" so as to increase performance, reduce memory consumption, etc.. However, having written a somewhat "lazy" string slicing/etc operation class I called a "string view", whose discussion and implementation can be found in the py3k list, I do believe that having a related type, perhaps with the tree-based implementation you have written, or a simple pointer + length variant like I have written, would be useful to have available to Python. I also believe that it smells like a Py3k feature, which suggests that you should toss the whole string reliance and switch to unicode, as str and unicode become bytes and text in Py3k, with bytes being mutable. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
See also the Cedar Ropes work: http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]
Larry Hastings schrieb: > I've significantly enhanced my string-concatenation patch, to the point > where that name is no longer accurate. So I've redubbed it the "lazy > strings" patch. It's not clear to me what you want to achieve with these patches, in particular, whether you want to see them integrated into Python or not. > The major new feature is that string *slices* are also represented with > a lazy-evaluation placeholder for the actual string, just as > concatenated strings were in my original patch. The lazy slice object > stores a reference to the original PyStringObject * it is sliced from, > and the desired start and stop slice markers. (It only supports step = > 1.) I think this specific approach will find strong resistance. It has been implemented many times, e.g. (apparently) in NextStep's NSString, and in Java's string type (where a string holds a reference to a character array, a start index, and an end index). Most recently, it was discussed under the name "string view" on the Py3k list, see http://mail.python.org/pipermail/python-3000/2006-August/003282.html Traditionally, the biggest objection is that even small strings may consume insane amounts of memory. > Its ob_sval is NULL until the string is rendered--but that rarely > happens! Not only does this mean string slices are faster, but I bet > this generally reduces overall memory usage for slices too. Channeling Guido: what real-world applications did you study with this patch to make such a claim? > I'm ready to post the patch. However, as a result of this work, the > description on the original patch page is really no longer accurate: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5470&atid=305470 > Shall I close/delete that patch and submit a new patch with a more > modern description? After all, there's not a lot of activity on the old > patch page... Closing the issue and opening a new is fine. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Talin wrote: > Interesting - is it possible that the same technique could be used to > hide differences in character width? Specifically, if I concatenate an > ascii string with a UTF-32 string, can the up-conversion to UTF-32 also > be done lazily? of course. and if all you do with the result is write it to an UTF-8 stream, it doesn't need to be done at all. this requires a slightly more elaborate C-level API interface than today's PyString_AS_STRING API, though... (which is why this whole exercise belongs on the Python 3000 lists, not on python-dev for 2.X) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The "lazy strings" patch
Interesting - is it possible that the same technique could be used to hide differences in character width? Specifically, if I concatenate an ascii string with a UTF-32 string, can the up-conversion to UTF-32 also be done lazily? If that could be done efficiently, it would resolve some outstanding issues that have come up on the Python-3000 list with regards to str/unicode convergence. Larry Hastings wrote: > > I've significantly enhanced my string-concatenation patch, to the point > where that name is no longer accurate. So I've redubbed it the "lazy > strings" patch. > > The major new feature is that string *slices* are also represented with > a lazy-evaluation placeholder for the actual string, just as > concatenated strings were in my original patch. The lazy slice object > stores a reference to the original PyStringObject * it is sliced from, > and the desired start and stop slice markers. (It only supports step = > 1.) Its ob_sval is NULL until the string is rendered--but that rarely > happens! Not only does this mean string slices are faster, but I bet > this generally reduces overall memory usage for slices too. > > Now, one rule of the Python programming API is that "all strings are > zero-terminated". That part of makes the life of a Python extension > author sane--they don't have to deal with some exotic Python string > class, they can just assume C-style strings everywhere. Ordinarily, > this means a string slice couldn't simply point into the original > string; if it did, and you executed > x = "abcde" > y = x[1:4] > internally y->ob_sval[3] would not be 0, it would be 'e', breaking the > API's rule about strings. > > However! When a PyStringObject lives out its life purely within the > Python VM, the only code that strenuously examines its internals is > stringobject.c. And that code almost never needs the trailing zero*. > So I've added a new static method in stringobject.c: >char * PyString_AsUnterminatedString(PyStringObject *) > If you call it on a lazy-evaluation slice object, it gives you back a > pointer into the original string's ob_sval. The s->ob_size'th element > of this *might not* be zero, but if you call this function you're saying > that's a-okay, you promise not to look at it. (If the PyStringObject * > is any other variety, it calls into PyString_AsString, which renders > string concatenation objects then returns ob_sval.) > > Again: this behavior is *never* observed by anyone outside of > stringobject.c. External users of PyStringObjects call > PyString_AS_STRING(), which renders all lazy concatenation and lazy > slices so they look just like normal zero-terminated PyStringObjects. > With my patch applied, trunk still passes all expected tests. > > Of course, lazy slice objects aren't just for literal slices created > with [x:y]. There are lots of string methods that return what are > effectively string slices, like lstrip() and split(). > > With this code in place, string slices that aren't examined by modules > are very rarely rendered. I ran "pybench -n 2" (two rounds, warp 10 > (whatever that means)) while collecting some statistics. When it > finished, the interpreter had created a total of 640,041 lazy slices, of > which only *19* were ever rendered. > > > Apart from lazy slices, there's only one more enhancement when compared > with v1: string prepending now reuses lazy concatenation objects much > more often. There was an optimization in string_concatenate > (Python/ceval.c) that said: "if the left-side string has two references, > and we're about to overwrite the second reference by storing this > concatenation to an object, tell that object to drop its reference". > That often meant the reference on the string dropped to 1, which meant > PyString_Resize could just resize the left-side string in place and > append the right-side. I modified it so it drops the reference to the > right-hand operand too. With this change, even with a reduction in the > allowable stack depth for right-hand recursion (so it's less likely to > blow the stack), I was able to prepend over 86k strings before it forced > a render. (Oh, for the record: I ensure depth limits are enforced when > combining lazy slices and lazy concatenations, so you still won't blow > your stack when you mix them together.) > > > Here are the highlights of a single apples-to-apples pybench run, 2.6 > trunk revision 52413 ("this") versus that same revision with my patch > applied ("other"): > > Test minimum run-timeaverage run-time > thisother diffthisother > diff > --- > > > ConcatStrings: 204ms76ms +168.4% 213ms77ms > +177.7% > CreateStringsWithConcat: 159ms 138ms +15.7% 163ms 142ms > +15.1% > Stri
[Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom]
I've significantly enhanced my string-concatenation patch, to the point where that name is no longer accurate. So I've redubbed it the "lazy strings" patch. The major new feature is that string *slices* are also represented with a lazy-evaluation placeholder for the actual string, just as concatenated strings were in my original patch. The lazy slice object stores a reference to the original PyStringObject * it is sliced from, and the desired start and stop slice markers. (It only supports step = 1.) Its ob_sval is NULL until the string is rendered--but that rarely happens! Not only does this mean string slices are faster, but I bet this generally reduces overall memory usage for slices too. Now, one rule of the Python programming API is that "all strings are zero-terminated". That part of makes the life of a Python extension author sane--they don't have to deal with some exotic Python string class, they can just assume C-style strings everywhere. Ordinarily, this means a string slice couldn't simply point into the original string; if it did, and you executed x = "abcde" y = x[1:4] internally y->ob_sval[3] would not be 0, it would be 'e', breaking the API's rule about strings. However! When a PyStringObject lives out its life purely within the Python VM, the only code that strenuously examines its internals is stringobject.c. And that code almost never needs the trailing zero*. So I've added a new static method in stringobject.c: char * PyString_AsUnterminatedString(PyStringObject *) If you call it on a lazy-evaluation slice object, it gives you back a pointer into the original string's ob_sval. The s->ob_size'th element of this *might not* be zero, but if you call this function you're saying that's a-okay, you promise not to look at it. (If the PyStringObject * is any other variety, it calls into PyString_AsString, which renders string concatenation objects then returns ob_sval.) Again: this behavior is *never* observed by anyone outside of stringobject.c. External users of PyStringObjects call PyString_AS_STRING(), which renders all lazy concatenation and lazy slices so they look just like normal zero-terminated PyStringObjects. With my patch applied, trunk still passes all expected tests. Of course, lazy slice objects aren't just for literal slices created with [x:y]. There are lots of string methods that return what are effectively string slices, like lstrip() and split(). With this code in place, string slices that aren't examined by modules are very rarely rendered. I ran "pybench -n 2" (two rounds, warp 10 (whatever that means)) while collecting some statistics. When it finished, the interpreter had created a total of 640,041 lazy slices, of which only *19* were ever rendered. Apart from lazy slices, there's only one more enhancement when compared with v1: string prepending now reuses lazy concatenation objects much more often. There was an optimization in string_concatenate (Python/ceval.c) that said: "if the left-side string has two references, and we're about to overwrite the second reference by storing this concatenation to an object, tell that object to drop its reference". That often meant the reference on the string dropped to 1, which meant PyString_Resize could just resize the left-side string in place and append the right-side. I modified it so it drops the reference to the right-hand operand too. With this change, even with a reduction in the allowable stack depth for right-hand recursion (so it's less likely to blow the stack), I was able to prepend over 86k strings before it forced a render. (Oh, for the record: I ensure depth limits are enforced when combining lazy slices and lazy concatenations, so you still won't blow your stack when you mix them together.) Here are the highlights of a single apples-to-apples pybench run, 2.6 trunk revision 52413 ("this") versus that same revision with my patch applied ("other"): Test minimum run-time average run-time this other diff this other diff --- ConcatStrings: 204ms 76ms +168.4% 213ms 77ms +177.7% CreateStringsWithConcat: 159ms 138ms +15.7% 163ms 142ms +15.1% StringSlicing: 142ms 86ms +65.5% 145ms 88ms +64.6% --- Totals: 7976ms 7713ms +3.4% 8257ms 7975ms +3.5% I also ran this totally unfair benchmark: x = "abcde" * (2) # 100k characters for i in xrange(1000): y = x[1:-1] and found my patched version to be 9759% faster. (You heard that right, 98x faster.) I'm ready to post the patch. However, as a result of this work, the description on the original patch page is really no longer accurate: http://sourceforge.net/tracker/index.php?func