Re: [Python-3000] Droping find/rfind?
Ivan Krstić wrote: > Jean-Paul Calderone wrote: >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion > > This is the same Itamar who, in the talk I linked a few days ago > (http://ln-s.net/D+u) extolled buffer as a very real performance > improvement in fast python networking, and asked for broader and more > complete support for buffers, rather than their removal. > > A bunch of people, myself included, want to use Python as a persistent > network server. Proper support for reading into already-allocated > memory, and non-copying strings are pretty indispensable for serious > production use. A mutable bytes type with deque-like performance characteristics (i.e O(1) insert/pop at index 0 as well as at the end), as well as the appropriate mutating methods (like read_into()) should go a long way to meeting those needs. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Droping find/rfind?
Ron Adam wrote: > Nick Coghlan wrote: >> Fredrik Lundh wrote: >>> Nick Coghlan wrote: >>> > Nick Coghlan wrote: > >> With a variety of "view types", that work like the corresponding builtin >> type, >> but reference the original data structure instead of creating copies > support for string views would require some serious interpreter surgery, > though, > and probably break quite a few extensions... Why do you say that? >>> because I happen to know a lot about how Python's string types are >>> implemented ? >> I believe you're thinking about something far more sophisticated than what >> I'm >> suggesting. I'm just talking about a Python data type in a standard library >> module that trades off slower performance with smaller strings (due to extra >> method call overhead) against improved scalability (due to avoidance of >> copying strings around). >> make a view of it >>> so to make a view of a string, you make a view of it ? >> Yep - by using all those "start" and "stop" optional arguments to builtin >> string methods to implement the methods of a string view in pure Python. By >> creating the string view all you would really be doing is a partial >> application of start and stop arguments on all of the relevant string >> methods. >> >> I've included an example below that just supports __len__, __str__ and >> partition(). The source object survives for as long as the view does - the >> idea is that the view should only last while you manipulate the string, with >> only real strings released outside the function via return statements or >> yield >> expressions. > > >>>> self.source = "%s" % source > > I think this should be. > > self.source = source > > Other wise you are making copies of the source which is what you > are trying to avoid. I'm not sure if python would reuse the self.source > string, but I wouldn't count on it. CPython 2.5 certainly doesn't reuse the existing string object. Given that what I wrote is the way to ensure you have a builtin string type (str or unicode) without coercing actual unicode objects to str objects or vice-versa, it should probably be subjected to the same optimisation as the str() and unicode() constructors (i.e., simply increfing and returning the original builtin string). > It might be nice if slice objects could be used in more ways in python. > That may work in most cases where you would want a string view. That's quite an interesting idea. With that approach, rather than having to duplicate 'concrete sequence with copying semantics' and 'sequence view with non-copying semantics' everywhere, you could just provide methods on objects that returned the appropriate slice objects representing the location of relevant sections, rather than copies of the sections themselves. To make that work effectively, you'd need to implement __nonzero__ on slice objects as "((self.stop - self.start) // self.step) > 0" (Either that or implement __len__, which would contribute to making slice() look more and more like xrange(), as someone else noted recently). Using the same signature as partition: def partition_indices(self, sep, start=None, stop=None): if start is None: start = 0 if stop is None: stop = len(s) try: idxsep = self.index(sep, start, stop) except ValueError: return slice(start, stop), slice(0), slice(0) endsep = idxsep + len(sep) return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop) Then partition() itself would be equivalent to: def partition(self, sep, start=None, stop=None): before, sep, after = self.partition_indices(sep, start, stop) return self[before], self[sep], self[after] Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Droping find/rfind?
Nick Coghlan <[EMAIL PROTECTED]> wrote: > Ivan KrstiÄ wrote: > > Jean-Paul Calderone wrote: > >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion > > > > This is the same Itamar who, in the talk I linked a few days ago > > (http://ln-s.net/D+u) extolled buffer as a very real performance > > improvement in fast python networking, and asked for broader and more > > complete support for buffers, rather than their removal. > > > > A bunch of people, myself included, want to use Python as a persistent > > network server. Proper support for reading into already-allocated > > memory, and non-copying strings are pretty indispensable for serious > > production use. > > A mutable bytes type with deque-like performance characteristics (i.e O(1) > insert/pop at index 0 as well as at the end), as well as the appropriate > mutating methods (like read_into()) should go a long way to meeting those > needs. The implementation of deque and the idea behind bytes are not compatible. Everything I've heard about the proposal of bytes is that it is effectively a C unsigned char[] with some convenience methods, very similar to a Python array.array("B"), with different methods. There is also an implementation in the Py3k branch. Also, while I would have a use for bytes as currently implemented (with readinto() ), I would have approximately zero use for a deque-like bytes object (never mind that due to Python not allowing multi-segment buffers, etc., it would be functionally impossible to get equivalent time bounds). - Josiah ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] Making more effective use of slice objects in Py3k
This idea is inspired by the find/rfind string discussion (particularly a couple of comments from Jim and Ron), but I think the applicability may prove to be wider than just string methods (e.g. I suspect it may prove useful for the bytes() type as well). Copy-on-slice semantics are by far the easiest semantics to deal with in most cases, as they result in the fewest nasty surprises. However, they have one obvious drawback: performance can suffer badly when dealing with large datasets (copying 10 MB chunks of memory around can take a while!). There are a couple of existing workarounds for this: buffer() objects, and the start/stop arguments to a variety of string methods. Neither of these is particular convenient to work with, and buffer() is slated to go away in Py3k. I think an enriched slicing model that allows sequence views to be expressed easily as "this slice of this sequence" would allow this to be dealt with cleanly, without requiring every sequence to provide a corresponding "sequence view" with non-copying semantics. I think Guido's concern that people will reach for string views when they don't need them is also valid (as I believe that it is most often inexperience that leads to premature optimization that then leads to needless code complexity). The specific changes I suggest based on the find/rfind discussion are: 1. make range() (what used to be xrange()) a subclass of slice(), so that range objects can be used to index sequences. The only differences between range() and slice() would then be that start/stop/step will never be None for range instances, and range instances act like an immutable sequence while slice instances do not (i.e. range objects would grow an indices() method). 2. change range() and slice() to accept slice() instances as arguments so that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError if x.stop is None). 3. change API's that currently accept start/stop arguments (like string methods) to accept a single slice() instance instead (possibly raising ValueError if step != 1). 4. provide an additional string method partition_indices() that returns 3 range() objects instead of 3 new strings The new method would have semantics like: def partition_indices(self, sep, limits=None): if limits is None: limits = range(0, len(self)) else: limits = limits.indices(len(self)) try: idxsep = self.index(sep, limits) except ValueError: return limits, range(0), range(0) endsep = idxsep + len(sep) return (range(limits.start, idxsep), range(idxsep, endsep), range(endsep, limits.stop)) With partition() itself being equivalent to: def partition(self, sep, subseq=None): before, sep, after = self.partition_indices(sep, subseq) return self[before], self[sep], self[after] Finally, an efficient partition based implementation of the example from Walter that started the whole discussion about views and the problem with excessive copying would look like: def splitpartition_indices(s): rest = range(len(s)) while 1: prefix, lbrace, rest = s.partition_indices("{", rest) first, space, rest = s.partition_indices(" ", rest) second, rbrace, rest = s.partition_indices("}", rest) if prefix: yield (None, s[prefix]) if not (lbrace and space and rbrace): break yield (s[first], s[second]) (I know the above misses a micro-optimization, in that it calls partition again on an empty subsequence, even if space or lbrace are False. I believe doing the three partition calls together makes it much easier to read, and searching an empty string is pretty quick). For comparison, here's the normal copying version that has problems scaling to large strings: def splitpartition(s): rest = s while 1: prefix, lbrace, rest = rest.partition_indices("{") first, space, rest = rest.partition_indices(" ") second, rbrace, rest = rest.partition_indices("}") if prefix: yield (None, prefix) if not (lbrace and space and rbrace): break yield (first, second) Should I make a Py3k PEP for this? Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
Nick Coghlan wrote: A couple of errors in the sample code. > The new method would have semantics like: > >def partition_indices(self, sep, limits=None): >if limits is None: >limits = range(0, len(self)) >else: >limits = limits.indices(len(self)) Either that line should be: limits = range(*limits.indices(len(self))) Or the definition of indices() would need to be changed to return a range() object instead of a 3-tuple. > For comparison, here's the normal copying version that has problems scaling > to > large strings: > > def splitpartition(s): > rest = s > while 1: > prefix, lbrace, rest = rest.partition_indices("{") > first, space, rest = rest.partition_indices(" ") > second, rbrace, rest = rest.partition_indices("}") Those 3 lines should be: prefix, lbrace, rest = rest.partition("{") first, space, rest = rest.partition(" ") second, rbrace, rest = rest.partition("}") Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Droping find/rfind?
Nick Coghlan wrote: > Ron Adam wrote: >> Nick Coghlan wrote: [clipped] >> It might be nice if slice objects could be used in more ways in python. >> That may work in most cases where you would want a string view. > > That's quite an interesting idea. With that approach, rather than having to > duplicate 'concrete sequence with copying semantics' and 'sequence view with > non-copying semantics' everywhere, you could just provide methods on objects > that returned the appropriate slice objects representing the location of > relevant sections, rather than copies of the sections themselves. Yes, and possibly having more methods that accept slice objects could make that idea work in a way that would seem more natural. > To make that work effectively, you'd need to implement __nonzero__ on slice > objects as "((self.stop - self.start) // self.step) > 0" (Either that or > implement __len__, which would contribute to making slice() look more and > more > like xrange(), as someone else noted recently). Since xrange() has the same signature, it might be nice to be able to use a slice object directly in xrange to get indices to a substring or list. For that to work, slice.indices would need to not return None, and/or xrange would need to accept None. They differ in how they handle negative indices as well. So I expect it may be too big of a change. > Using the same signature as partition: > > def partition_indices(self, sep, start=None, stop=None): > if start is None: start = 0 > if stop is None: stop = len(s) > try: > idxsep = self.index(sep, start, stop) > except ValueError: > return slice(start, stop), slice(0), slice(0) > endsep = idxsep + len(sep) > return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, > stop) > > Then partition() itself would be equivalent to: > > def partition(self, sep, start=None, stop=None): > before, sep, after = self.partition_indices(sep, start, stop) > return self[before], self[sep], self[after] > > Cheers, > Nick. Just a little timing for the fun of it. ;-) 2.5c1 (r25c1:51305, Aug 17 2006, 10:41:11) [MSC v.1310 32 bit (Intel)] splitindex : 0.02866 splitview : 0.28021 splitpartition : 0.34991 splitslice : 0.07892 This may not be the best use case, (if you can call it that). It does show that the slice "as a view" idea may have some potential. But underneath it's just using index, so a well written function with index will probably always be faster. Cheers, Ron """ Compare different index, string view, and partition methods. """ # Split by str.index. def splitindex(s): pos = 0 while True: try: posstart = s.index("{", pos) posarg = s.index(" ", posstart) posend = s.index("}", posarg) except ValueError: break yield None, s[pos:posstart] yield s[posstart+1:posarg], s[posarg+1:posend] pos = posend+1 rest = s[pos:] if rest: yield None, rest # - Simple string view. class strview(object): def __new__(cls, source, start=None, stop=None): self = object.__new__(cls) self.source = source #self.start = start if start is not None else 0 self.start = start != None and start or 0 #self.stop = stop if stop is not None else len(source) self.stop = stop != None and stop or len(source) return self def __str__(self): return self.source[self.start:self.stop] def __len__(self): return self.stop - self.start def partition(self, sep): _src = self.source try: startsep = _src.index(sep, self.start, self.stop) except ValueError: # Separator wasn't found! return self, _NULL_STR, _NULL_STR # Return new views of the three string parts endsep = startsep + len(sep) return (strview(_src, self.start, startsep), strview(_src, startsep, endsep), strview(_src, endsep, self.stop)) _NULL_STR = strview('') def splitview(s): rest = strview(s) while 1: prefix, found, rest = rest.partition("{") if prefix: yield (None, str(prefix)) if not found: break first, found, rest = rest.partition(" ") if not found: break second, found, rest = rest.partition("}") if not found: break yield (str(first), str(second)) # Split by str.partition. def splitpartition(s): rest = s while 1: prefix, found, temp = rest.partition("{") first, found, temp = temp.partition(" ") second, found, temp = temp.partition("}") if not found: break yield None, prefix yield fir
Re: [Python-3000] long/int unification
Josiah Carlson <[EMAIL PROTECTED]> writes: > Also, depending on the objects, one may consider a few other tagged > objects, like perhaps None, True, and False I doubt that it's worth it: they are not dynamically computed anyway, so there is little gain (only avoiding manipulating their refcounts), and the loss is a greater number of special cases when accessing contents of every object. > or even just use 31/63 bits for the tagged integer value, with a 1 > in the lowest bit signifying it as a tagged integer. This is exactly what my compiler of my language does. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
"Nick Coghlan" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I think an enriched slicing model that allows sequence views to be > expressed > easily as "this slice of this sequence" would allow this to be dealt with > cleanly, without requiring every sequence to provide a corresponding > "sequence > view" with non-copying semantics. I think this is promising. I like the potential unification. > Should I make a Py3k PEP for this? I think so ;-) tjr ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] long/int unification
On 8/25/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > Josiah Carlson wrote: > > > In the integer case, it reminds me of James Knight's tagged integer > > patch to 2.3 [1]. If using long exclusively is 50% slower, why not try > > the improved speed approach? > > looks like GvR was -1000 on this idea at the time, though... I still am, because it requires extra tests for every incref and decref and also for every use of an object's type pointer. I worry about the cost of these tests, but I worry much more about the bugs it will add when people don't tests first. ABC used this approach and we kept finding bugs due to this problem. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
Can you explain in a sentence or two how these changes would be *used*? Your code examples don't speak for themselves (maybe because It's Saturday morning :-). Short examples of something clumsy and/or slow that we'd have to write today compared to something fast and elegant that we could write after the change woulde be quite helpful. The exact inheritance relationship between slice and [x]range seems a fairly uninteresting details in comparison. --Guido On 8/26/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: > This idea is inspired by the find/rfind string discussion (particularly a > couple of comments from Jim and Ron), but I think the applicability may prove > to be wider than just string methods (e.g. I suspect it may prove useful for > the bytes() type as well). > > Copy-on-slice semantics are by far the easiest semantics to deal with in most > cases, as they result in the fewest nasty surprises. However, they have one > obvious drawback: performance can suffer badly when dealing with large > datasets (copying 10 MB chunks of memory around can take a while!). > > There are a couple of existing workarounds for this: buffer() objects, and the > start/stop arguments to a variety of string methods. Neither of these is > particular convenient to work with, and buffer() is slated to go away in Py3k. > > I think an enriched slicing model that allows sequence views to be expressed > easily as "this slice of this sequence" would allow this to be dealt with > cleanly, without requiring every sequence to provide a corresponding "sequence > view" with non-copying semantics. I think Guido's concern that people will > reach for string views when they don't need them is also valid (as I believe > that it is most often inexperience that leads to premature optimization that > then leads to needless code complexity). > > The specific changes I suggest based on the find/rfind discussion are: > >1. make range() (what used to be xrange()) a subclass of slice(), so that > range objects can be used to index sequences. The only differences between > range() and slice() would then be that start/stop/step will never be None for > range instances, and range instances act like an immutable sequence while > slice instances do not (i.e. range objects would grow an indices() method). > >2. change range() and slice() to accept slice() instances as arguments so > that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError > if x.stop is None). > >3. change API's that currently accept start/stop arguments (like string > methods) to accept a single slice() instance instead (possibly raising > ValueError if step != 1). > >4. provide an additional string method partition_indices() that returns 3 > range() objects instead of 3 new strings > > The new method would have semantics like: > >def partition_indices(self, sep, limits=None): >if limits is None: >limits = range(0, len(self)) >else: >limits = limits.indices(len(self)) >try: >idxsep = self.index(sep, limits) >except ValueError: >return limits, range(0), range(0) >endsep = idxsep + len(sep) >return (range(limits.start, idxsep), >range(idxsep, endsep), >range(endsep, limits.stop)) > > With partition() itself being equivalent to: > > def partition(self, sep, subseq=None): > before, sep, after = self.partition_indices(sep, subseq) > return self[before], self[sep], self[after] > > Finally, an efficient partition based implementation of the example from > Walter that started the whole discussion about views and the problem with > excessive copying would look like: > > def splitpartition_indices(s): > rest = range(len(s)) > while 1: > prefix, lbrace, rest = s.partition_indices("{", rest) > first, space, rest = s.partition_indices(" ", rest) > second, rbrace, rest = s.partition_indices("}", rest) > if prefix: > yield (None, s[prefix]) > if not (lbrace and space and rbrace): > break > yield (s[first], s[second]) > > (I know the above misses a micro-optimization, in that it calls partition > again on an empty subsequence, even if space or lbrace are False. I believe > doing the three partition calls together makes it much easier to read, and > searching an empty string is pretty quick). > > For comparison, here's the normal copying version that has problems scaling to > large strings: > > def splitpartition(s): > rest = s > while 1: > prefix, lbrace, rest = rest.partition_indices("{") > first, space, rest = rest.partition_indices(" ") > second, rbrace, rest = rest.partition_indices("}") > if prefix: > yield (None, prefix) > if not (lbrace and space and rbrace): > break > yield (first, second) > > Should I make
Re: [Python-3000] Making more effective use of slice objects in Py3k
Nick Coghlan <[EMAIL PROTECTED]> wrote: > > This idea is inspired by the find/rfind string discussion (particularly a > couple of comments from Jim and Ron), but I think the applicability may prove > to be wider than just string methods (e.g. I suspect it may prove useful for > the bytes() type as well). A couple comments... I don't particularly like the idea of using lists (or really iter(list) ), range, or slice objects as defining what indices remain for a particular string operation. It just doesn't seem like the *right* thing to do. > There are a couple of existing workarounds for this: buffer() objects, and > the > start/stop arguments to a variety of string methods. Neither of these is > particular convenient to work with, and buffer() is slated to go away in Py3k. Ahh, but string views offer a significantly more reasonable mechanism. string = stringview(string) Now, you can do things like parition(), slicing (with step=1), etc., and all can return further string views. Users don't need to learn a new semantic (pass the sequence of indices). We can toss all of the optional start, stop arguments to all string functions, and replace them with either of the following: result = stringview(string, start=None, stop=None).method(args) string = stringview(string) result = string[start:stop].method(args) Perhaps one of the reasons why I prefer string views over this indices mechanism is because I'm familliar with buffers, the idea of just having a pointer into another structure, etc. It just feels more natural from my 8 years of C and 6 years of Python. - Josiah ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] find -> index patch
On Thu, Aug 24, 2006 at 03:48:57PM +0200, Fredrik Lundh wrote: > Michael Chermside wrote: > > >> WOW, I love partition. In all the instances that weren't a simple "in" > >> test I ended up using [r]partition. In some cases one of the returned > >> strings gets thrown away but in those cases it is guaranteed to be small. > >> The new code is usually smaller than the old and generally clearer. > > > > Wow. That's just beautiful. This has now convinced me that dumping > > [r]find() (at least!) and pushing people toward using partition will > > result in pain in the short term (of course), and beautiful, readable > > code in the long term. > > note that partition provides an elegant solution to an important *subset* of > all > problems addressed by find/index. > > just like lexical scoping vs. default arguments and map vs. list > comprehensions, > it doesn't address all problems right out of the box, and shouldn't be > advertised > as doing that. > After some benchmarking find() can't go away without really hurting readline() performance. partition performs as well as find for small lines but for large lines the extra copy to concat the newline separator is a killer (twice as slow for 50k char lines). index has the opposite problem as the overhead of setting up a try block makes 50 char lines twice as slow even when the except clause is never triggered. A version of partition that returned two arguments instead of three would solve the problem but that would just be adding more functions to remove the two find's or adding behavior flags to partition. Ick. Most uses of find are better off using partition but if this one case can't be beat there must be others too. -Jack ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote: > Nick Coghlan <[EMAIL PROTECTED]> wrote: > > There are a couple of existing workarounds for > > this: buffer() objects, and the start/stop arguments > > to a variety of string methods. Neither of these is > > particular convenient to work with, and buffer() is > > slated to go away in Py3k. > Ahh, but string views offer a significantly more > reasonable mechanism. As I understand it, Nick is suggesting that slice objects be used as a sequence (not just string) view. > string = stringview(string) > ... We can toss all of the optional start, stop > arguments to all string functions, and replace them > with either of the following: > result = stringview(string, start=None, stop=None).method(args) > string = stringview(string) > result = string[start:stop].method(args) Under Nick's proposal, I believe we could replace it with just the final line. result = string[start:stop].method(args) though there is a chance that (when you want to avoid copying) he is suggesting explicit slice objects such as view=slice(start, stop) result = view(string).method(args) -jJ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py
In Py3K, is it still safe to assume that a list of paths will be (enough like) ordinary strings? I ask because of the various Path object discussions; it wasn't clear that a Path object should be a sequence of (normalized unicode?) characters (rather than path components), that the path would always be normalized or absolute, or even that it would implement the LE (or LT?) comparison operator. -jJ On 8/26/06, jack.diederich <[EMAIL PROTECTED]> wrote: > Author: jack.diederich > Date: Sat Aug 26 20:42:06 2006 > New Revision: 51624 > Added: python/trunk/Lib/genericpath.py > +# Return the longest prefix of all list elements. > +def commonprefix(m): > +"Given a list of pathnames, returns the longest common leading component" > +if not m: return '' > +s1 = min(m) > +s2 = max(m) > +n = min(len(s1), len(s2)) > +for i in xrange(n): > +if s1[i] != s2[i]: > +return s1[:i] > +return s1[:n] ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] find -> index patch
On 8/26/06, Jack Diederich <[EMAIL PROTECTED]> wrote: > After some benchmarking find() can't go away without really hurting readline() > performance. Can you elaborate? readline() is typically implemented in C so I'm not sure I follow. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote: > > Nick Coghlan <[EMAIL PROTECTED]> wrote: > > > > There are a couple of existing workarounds for > > > this: buffer() objects, and the start/stop arguments > > > to a variety of string methods. Neither of these is > > > particular convenient to work with, and buffer() is > > > slated to go away in Py3k. > > > Ahh, but string views offer a significantly more > > reasonable mechanism. > > As I understand it, Nick is suggesting that slice objects be used as a > sequence (not just string) view. I have a hard time parsing this sentence. A slice is an object with three immutable attributes -- start, stop, step. How does this double as a string view? > > string = stringview(string) > > ... We can toss all of the optional start, stop > > arguments to all string functions, and replace them > > with either of the following: > > result = stringview(string, start=None, stop=None).method(args) > > > string = stringview(string) > > result = string[start:stop].method(args) > > Under Nick's proposal, I believe we could replace it with just the final line. I still don't see the transformation of clumsy to elegant. Please give me a complete, specific example instead of a generic code snippet. (Also, please don't use 'string' as a variable name. There's a module by that name that I can't get out of my head.) Maybe the idea is that instead of pos = s.find(t, pos) we would write pos += stringview(s)[pos:].find(t) ??? And how is that easier on the eyes? (And note the need to use += because the sliced view renumbers the positions in the original string.) > result = string[start:stop].method(args) > > though there is a chance that (when you want to avoid copying) he is > suggesting explicit slice objects such as > > view=slice(start, stop) > result = view(string).method(args) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py
It is not my intention to adopt the Path module in Py3k. On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > In Py3K, is it still safe to assume that a list of paths will be > (enough like) ordinary strings? > > I ask because of the various Path object discussions; it wasn't clear > that a Path object should be a sequence of (normalized unicode?) > characters (rather than path components), that the path would always > be normalized or absolute, or even that it would implement the LE (or > LT?) comparison operator. > > -jJ > > On 8/26/06, jack.diederich <[EMAIL PROTECTED]> wrote: > > Author: jack.diederich > > Date: Sat Aug 26 20:42:06 2006 > > New Revision: 51624 > > > Added: python/trunk/Lib/genericpath.py > > > +# Return the longest prefix of all list elements. > > +def commonprefix(m): > > +"Given a list of pathnames, returns the longest common leading > > component" > > +if not m: return '' > > +s1 = min(m) > > +s2 = max(m) > > +n = min(len(s1), len(s2)) > > +for i in xrange(n): > > +if s1[i] != s2[i]: > > +return s1[:i] > > +return s1[:n] > ___ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
On 8/26/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > > On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote: > > > Nick Coghlan <[EMAIL PROTECTED]> wrote: > > > > There are a couple of existing workarounds for > > > > this: buffer() objects, and the start/stop > > > > arguments to a variety of string methods. > > > Ahh, but string views offer a significantly more > > > reasonable mechanism. > > As I understand it, Nick is suggesting that slice > > objects be used as a sequence (not just string) > > view. > I have a hard time parsing this sentence. A slice is > an object with three immutable attributes -- start, > stop, step. How does this double as a string view? Poor wording on my part; it is (the application of a slice to a specific sequence) that could act as copyless view. For example, you wanted to keep the rarely used optional arguments to find because of efficiency. s.find(prefix, start, stop) does not copy. If slices were less eager at copying, this could be rewritten as view=slice(start, stop, 1) view(s).find(prefix) or perhaps even as s[start:stop].find(prefix) I'm not sure these look better, but they are less surprising, because they don't depend on optional arguments that most people have forgotten about. > Maybe the idea is that instead of > pos = s.find(t, pos) > we would write > pos += stringview(s)[pos:].find(t) > ??? With stringviews, you wouldn't need to be reindexing from the start of the original string. The idiom would instead be a generalization of "for line in file:" while data: chunk, sep, data = data.partition() but the partition call would not need to copy the entire string; it could simply return three views. Yes, this does risk keeping all of data alive because one chunk was saved. This might be a reasonable tradeoff to avoid the copying. If not, perhaps the gc system could be augmented to shrink bloated views during idle moments. -jJ ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] find -> index patch
On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote: > On 8/26/06, Jack Diederich <[EMAIL PROTECTED]> wrote: > > After some benchmarking find() can't go away without really hurting > > readline() > > performance. > > Can you elaborate? readline() is typically implemented in C so I'm not > sure I follow. > A number of modules in Lib have readline() methods that currently use find(). StringIO, httplib, tarfile, and others sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l 30 Mainly I wanted to point out that find() solves a class of problems that can't be solved equally well with partition() (bad for large strings that want to preserve the seperator) or index() (bad for large numbers of small strings and for frequent misses). I wanted to reach the conclusion that find() could be yanked out but as Fredrik opined it is still useful for a subset of problems. -Jack ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Making more effective use of slice objects in Py3k
"Jim Jewett" <[EMAIL PROTECTED]> wrote: > With stringviews, you wouldn't need to be reindexing from the start of > the original string. The idiom would instead be a generalization of > "for line in file:" > > while data: > chunk, sep, data = data.partition() > > but the partition call would not need to copy the entire string; it > could simply return three views. Also, with a little work, having string views be smart about concatenation (if two views are adjacent to each other, like chunk,sep or sep,data above, view1+view2 -> view3 on the original string), copies could further be minimized, and the earlier problem with readline, etc., can be avoided. - Josiah ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com