Re: [Python-3000] Droping find/rfind?

2006-08-26 Thread Nick Coghlan
Ivan Krstić wrote:
> Jean-Paul Calderone wrote:
>> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
> 
> This is the same Itamar who, in the talk I linked a few days ago
> (http://ln-s.net/D+u) extolled buffer as a very real performance
> improvement in fast python networking, and asked for broader and more
> complete support for buffers, rather than their removal.
> 
> A bunch of people, myself included, want to use Python as a persistent
> network server. Proper support for reading into already-allocated
> memory, and non-copying strings are pretty indispensable for serious
> production use.

A mutable bytes type with deque-like performance characteristics (i.e O(1) 
insert/pop at index 0 as well as at the end), as well as the appropriate 
mutating methods (like read_into()) should go a long way to meeting those needs.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Droping find/rfind?

2006-08-26 Thread Nick Coghlan
Ron Adam wrote:
> Nick Coghlan wrote:
>> Fredrik Lundh wrote:
>>> Nick Coghlan wrote:
>>>
> Nick Coghlan wrote:
>
>> With a variety of "view types", that work like the corresponding builtin 
>> type,
>> but reference the original data structure instead of creating copies
> support for string views would require some serious interpreter surgery, 
> though,
> and probably break quite a few extensions...
 Why do you say that?
>>> because I happen to know a lot about how Python's string types are
>>> implemented ?
>> I believe you're thinking about something far more sophisticated than what 
>> I'm 
>> suggesting. I'm just talking about a Python data type in a standard library 
>> module that trades off slower performance with smaller strings (due to extra 
>> method call overhead) against improved scalability (due to avoidance of 
>> copying strings around).
>>
 make a view of it
>>> so to make a view of a string, you make a view of it ?
>> Yep - by using all those "start" and "stop" optional arguments to builtin 
>> string methods to implement the methods of a string view in pure Python. By 
>> creating the string view all you would really be doing is a partial 
>> application of start and stop arguments on all of the relevant string 
>> methods.
>>
>> I've included an example below that just supports __len__, __str__ and 
>> partition(). The source object survives for as long as the view does - the 
>> idea is that the view should only last while you manipulate the string, with 
>> only real strings released outside the function via return statements or 
>> yield 
>> expressions.
> 
> 
>>>>  self.source = "%s" % source
> 
> I think this should be.
> 
> self.source = source
> 
> Other wise you are making copies of the source which is what you
> are trying to avoid.  I'm not sure if python would reuse the self.source 
> string, but I wouldn't count on it.

CPython 2.5 certainly doesn't reuse the existing string object. Given that 
what I wrote is the way to ensure you have a builtin string type (str or 
unicode) without coercing actual unicode objects to str objects or vice-versa, 
it should probably be subjected to the same optimisation as the str() and 
unicode() constructors (i.e., simply increfing and returning the original 
builtin string).

> It might be nice if slice objects could be used in more ways in python. 
> That may work in most cases where you would want a string view.

That's quite an interesting idea. With that approach, rather than having to 
duplicate 'concrete sequence with copying semantics' and 'sequence view with 
non-copying semantics' everywhere, you could just provide methods on objects 
that returned the appropriate slice objects representing the location of 
relevant sections, rather than copies of the sections themselves.

To make that work effectively, you'd need to implement __nonzero__ on slice 
objects as "((self.stop - self.start) // self.step) > 0" (Either that or 
implement __len__, which would contribute to making slice() look more and more 
like xrange(), as someone else noted recently).

Using the same signature as partition:

def partition_indices(self, sep, start=None, stop=None):
if start is None: start = 0
if stop is None: stop = len(s)
try:
idxsep = self.index(sep, start, stop)
except ValueError:
return slice(start, stop), slice(0), slice(0)
endsep = idxsep + len(sep)
return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, stop)

Then partition() itself would be equivalent to:

def partition(self, sep, start=None, stop=None):
before, sep, after = self.partition_indices(sep, start, stop)
return self[before], self[sep], self[after]

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Droping find/rfind?

2006-08-26 Thread Josiah Carlson

Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Ivan Krstić wrote:
> > Jean-Paul Calderone wrote:
> >> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
> > 
> > This is the same Itamar who, in the talk I linked a few days ago
> > (http://ln-s.net/D+u) extolled buffer as a very real performance
> > improvement in fast python networking, and asked for broader and more
> > complete support for buffers, rather than their removal.
> > 
> > A bunch of people, myself included, want to use Python as a persistent
> > network server. Proper support for reading into already-allocated
> > memory, and non-copying strings are pretty indispensable for serious
> > production use.
> 
> A mutable bytes type with deque-like performance characteristics (i.e O(1) 
> insert/pop at index 0 as well as at the end), as well as the appropriate 
> mutating methods (like read_into()) should go a long way to meeting those 
> needs.

The implementation of deque and the idea behind bytes are not compatible. 
Everything I've heard about the proposal of bytes is that it is
effectively a C unsigned char[] with some convenience methods, very
similar to a Python array.array("B"), with different methods.  There is
also an implementation in the Py3k branch.

Also, while I would have a use for bytes as currently implemented (with
readinto() ), I would have approximately zero use for a deque-like bytes
object (never mind that due to Python not allowing multi-segment buffers,
etc., it would be functionally impossible to get equivalent time bounds).

 - Josiah

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Nick Coghlan
This idea is inspired by the find/rfind string discussion (particularly a 
couple of comments from Jim and Ron), but I think the applicability may prove 
to be wider than just string methods (e.g. I suspect it may prove useful for 
the bytes() type as well).

Copy-on-slice semantics are by far the easiest semantics to deal with in most 
cases, as they result in the fewest nasty surprises. However, they have one 
obvious drawback: performance can suffer badly when dealing with large 
datasets (copying 10 MB chunks of memory around can take a while!).

There are a couple of existing workarounds for this: buffer() objects, and the 
start/stop arguments to a variety of string methods. Neither of these is 
particular convenient to work with, and buffer() is slated to go away in Py3k.

I think an enriched slicing model that allows sequence views to be expressed 
easily as "this slice of this sequence" would allow this to be dealt with 
cleanly, without requiring every sequence to provide a corresponding "sequence 
view" with non-copying semantics. I think Guido's concern that people will 
reach for string views when they don't need them is also valid (as I believe 
that it is most often inexperience that leads to premature optimization that 
then leads to needless code complexity).

The specific changes I suggest based on the find/rfind discussion are:

   1. make range() (what used to be xrange()) a subclass of slice(), so that 
range objects can be used to index sequences. The only differences between 
range() and slice() would then be that start/stop/step will never be None for 
range instances, and range instances act like an immutable sequence while 
slice instances do not (i.e. range objects would grow an indices() method).

   2. change range() and slice() to accept slice() instances as arguments so 
that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError 
if x.stop is None).

   3. change API's that currently accept start/stop arguments (like string 
methods) to accept a single slice() instance instead (possibly raising 
ValueError if step != 1).

   4. provide an additional string method partition_indices() that returns 3 
range() objects instead of 3 new strings

The new method would have semantics like:

   def partition_indices(self, sep, limits=None):
   if limits is None:
   limits = range(0, len(self))
   else:
   limits = limits.indices(len(self))
   try:
   idxsep = self.index(sep, limits)
   except ValueError:
   return limits, range(0), range(0)
   endsep = idxsep + len(sep)
   return (range(limits.start, idxsep),
   range(idxsep, endsep),
   range(endsep, limits.stop))

With partition() itself being equivalent to:

 def partition(self, sep, subseq=None):
 before, sep, after = self.partition_indices(sep, subseq)
 return self[before], self[sep], self[after]

Finally, an efficient partition based implementation of the example from 
Walter that started the whole discussion about views and the problem with 
excessive copying would look like:

def splitpartition_indices(s):
  rest = range(len(s))
  while 1:
  prefix, lbrace, rest = s.partition_indices("{", rest)
  first, space, rest = s.partition_indices(" ", rest)
  second, rbrace, rest = s.partition_indices("}", rest)
  if prefix:
  yield (None, s[prefix])
  if not (lbrace and space and rbrace):
  break
  yield (s[first], s[second])

(I know the above misses a micro-optimization, in that it calls partition 
again on an empty subsequence, even if space or lbrace are False. I believe 
doing the three partition calls together makes it much easier to read, and 
searching an empty string is pretty quick).

For comparison, here's the normal copying version that has problems scaling to 
large strings:

def splitpartition(s):
  rest = s
  while 1:
  prefix, lbrace, rest = rest.partition_indices("{")
  first, space, rest = rest.partition_indices(" ")
  second, rbrace, rest = rest.partition_indices("}")
  if prefix:
  yield (None, prefix)
  if not (lbrace and space and rbrace):
  break
  yield (first, second)

Should I make a Py3k PEP for this?

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Nick Coghlan

Nick Coghlan wrote:

A couple of errors in the sample code.

> The new method would have semantics like:
> 
>def partition_indices(self, sep, limits=None):
>if limits is None:
>limits = range(0, len(self))
>else:
>limits = limits.indices(len(self))

Either that line should be:
limits = range(*limits.indices(len(self)))

Or the definition of indices() would need to be changed to return a range() 
object instead of a 3-tuple.

> For comparison, here's the normal copying version that has problems scaling 
> to 
> large strings:
> 
> def splitpartition(s):
>   rest = s
>   while 1:
>   prefix, lbrace, rest = rest.partition_indices("{")
>   first, space, rest = rest.partition_indices(" ")
>   second, rbrace, rest = rest.partition_indices("}")

Those 3 lines should be:
   prefix, lbrace, rest = rest.partition("{")
   first, space, rest = rest.partition(" ")
   second, rbrace, rest = rest.partition("}")

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Droping find/rfind?

2006-08-26 Thread Ron Adam
Nick Coghlan wrote:
> Ron Adam wrote:
>> Nick Coghlan wrote:

[clipped]

>> It might be nice if slice objects could be used in more ways in python. 
>> That may work in most cases where you would want a string view.
> 
> That's quite an interesting idea. With that approach, rather than having to 
> duplicate 'concrete sequence with copying semantics' and 'sequence view with 
> non-copying semantics' everywhere, you could just provide methods on objects 
> that returned the appropriate slice objects representing the location of 
> relevant sections, rather than copies of the sections themselves.

Yes, and possibly having more methods that accept slice objects could 
make that idea work in a way that would seem more natural.


> To make that work effectively, you'd need to implement __nonzero__ on slice 
> objects as "((self.stop - self.start) // self.step) > 0" (Either that or 
> implement __len__, which would contribute to making slice() look more and 
> more 
> like xrange(), as someone else noted recently).

Since xrange() has the same signature, it might be nice to be able to
use a slice object directly in xrange to get indices to a substring or list.

For that to work, slice.indices would need to not return None, and/or
xrange would need to accept None.  They differ in how they handle
negative indices as well.  So I expect it may be too big of a change.


> Using the same signature as partition:
> 
> def partition_indices(self, sep, start=None, stop=None):
> if start is None: start = 0
> if stop is None: stop = len(s)
> try:
> idxsep = self.index(sep, start, stop)
> except ValueError:
> return slice(start, stop), slice(0), slice(0)
> endsep = idxsep + len(sep)
> return slice(start, idxsep), slice(idxsep, endsep), slice(endsep, 
> stop)
> 
> Then partition() itself would be equivalent to:
> 
> def partition(self, sep, start=None, stop=None):
> before, sep, after = self.partition_indices(sep, start, stop)
> return self[before], self[sep], self[after]
> 
> Cheers,
> Nick.


Just a little timing for the fun of it. ;-)


2.5c1 (r25c1:51305, Aug 17 2006, 10:41:11) [MSC v.1310 32 bit (Intel)]
splitindex  : 0.02866
splitview   : 0.28021
splitpartition  : 0.34991
splitslice  : 0.07892


This may not be the best use case, (if you can call it that).  It does 
show that the slice "as a view" idea may have some potential. But 
underneath it's just using index, so a well written function with index 
will probably always be faster.

Cheers,
Ron


"""
 Compare different index, string view, and partition methods.
"""

#  Split by str.index.
def splitindex(s):
  pos = 0
  while True:
try:
posstart = s.index("{", pos)
posarg = s.index(" ", posstart)
posend = s.index("}", posarg)
except ValueError:
break
yield None, s[pos:posstart]
yield s[posstart+1:posarg], s[posarg+1:posend]
pos = posend+1
  rest = s[pos:]
  if rest:
  yield None, rest


# - Simple string view.
class strview(object):
  def __new__(cls, source, start=None, stop=None):
  self = object.__new__(cls)
  self.source = source
  #self.start = start if start is not None else 0
  self.start = start != None and start or 0
  #self.stop = stop if stop is not None else len(source)
  self.stop = stop != None and stop or len(source)
  return self
  def __str__(self):
  return self.source[self.start:self.stop]
  def __len__(self):
  return self.stop - self.start
  def partition(self, sep):
  _src = self.source
  try:
  startsep = _src.index(sep, self.start, self.stop)
  except ValueError:
  # Separator wasn't found!
  return self, _NULL_STR, _NULL_STR
  # Return new views of the three string parts
  endsep = startsep + len(sep)
  return (strview(_src, self.start, startsep),
  strview(_src, startsep, endsep),
  strview(_src, endsep, self.stop))

_NULL_STR = strview('')

def splitview(s):
   rest = strview(s)
   while 1:
   prefix, found, rest = rest.partition("{")
   if prefix:
   yield (None, str(prefix))
   if not found:
   break
   first, found, rest = rest.partition(" ")
   if not found:
   break
   second, found, rest = rest.partition("}")
   if not found:
   break
   yield (str(first), str(second))


#  Split by str.partition.
def splitpartition(s):
 rest = s
 while 1:
 prefix, found, temp = rest.partition("{")
 first, found, temp = temp.partition(" ")
 second, found, temp = temp.partition("}")
 if not found: break
 yield None, prefix
 yield fir

Re: [Python-3000] long/int unification

2006-08-26 Thread Marcin 'Qrczak' Kowalczyk
Josiah Carlson <[EMAIL PROTECTED]> writes:

> Also, depending on the objects, one may consider a few other tagged
> objects, like perhaps None, True, and False

I doubt that it's worth it: they are not dynamically computed anyway,
so there is little gain (only avoiding manipulating their refcounts),
and the loss is a greater number of special cases when accessing
contents of every object.

> or even just use 31/63 bits for the tagged integer value, with a 1
> in the lowest bit signifying it as a tagged integer.

This is exactly what my compiler of my language does.

-- 
   __("< Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Terry Reedy

"Nick Coghlan" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]

> I think an enriched slicing model that allows sequence views to be 
> expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding 
> "sequence
> view" with non-copying semantics.

I think this is promising.  I like the potential unification.

> Should I make a Py3k PEP for this?

I think so ;-)

tjr



___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] long/int unification

2006-08-26 Thread Guido van Rossum
On 8/25/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
>
> > In the integer case, it reminds me of James Knight's tagged integer
> > patch to 2.3 [1].  If using long exclusively is 50% slower, why not try
> > the improved speed approach?
>
> looks like GvR was -1000 on this idea at the time, though...

I still am, because it requires extra tests for every incref and
decref and also for every use of an object's type pointer. I worry
about the cost of these tests, but I worry much more about the bugs it
will add when people don't tests first. ABC used this approach and we
kept finding bugs due to this problem.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Guido van Rossum
Can you explain in a sentence or two how these changes would be
*used*? Your code examples don't speak for themselves (maybe because
It's Saturday morning :-). Short examples of something clumsy and/or
slow that we'd have to write today compared to something fast and
elegant that we could write after the change woulde be quite helpful.
The exact inheritance relationship between slice and [x]range seems a
fairly uninteresting details in comparison.

--Guido

On 8/26/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> This idea is inspired by the find/rfind string discussion (particularly a
> couple of comments from Jim and Ron), but I think the applicability may prove
> to be wider than just string methods (e.g. I suspect it may prove useful for
> the bytes() type as well).
>
> Copy-on-slice semantics are by far the easiest semantics to deal with in most
> cases, as they result in the fewest nasty surprises. However, they have one
> obvious drawback: performance can suffer badly when dealing with large
> datasets (copying 10 MB chunks of memory around can take a while!).
>
> There are a couple of existing workarounds for this: buffer() objects, and the
> start/stop arguments to a variety of string methods. Neither of these is
> particular convenient to work with, and buffer() is slated to go away in Py3k.
>
> I think an enriched slicing model that allows sequence views to be expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding "sequence
> view" with non-copying semantics. I think Guido's concern that people will
> reach for string views when they don't need them is also valid (as I believe
> that it is most often inexperience that leads to premature optimization that
> then leads to needless code complexity).
>
> The specific changes I suggest based on the find/rfind discussion are:
>
>1. make range() (what used to be xrange()) a subclass of slice(), so that
> range objects can be used to index sequences. The only differences between
> range() and slice() would then be that start/stop/step will never be None for
> range instances, and range instances act like an immutable sequence while
> slice instances do not (i.e. range objects would grow an indices() method).
>
>2. change range() and slice() to accept slice() instances as arguments so
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError
> if x.stop is None).
>
>3. change API's that currently accept start/stop arguments (like string
> methods) to accept a single slice() instance instead (possibly raising
> ValueError if step != 1).
>
>4. provide an additional string method partition_indices() that returns 3
> range() objects instead of 3 new strings
>
> The new method would have semantics like:
>
>def partition_indices(self, sep, limits=None):
>if limits is None:
>limits = range(0, len(self))
>else:
>limits = limits.indices(len(self))
>try:
>idxsep = self.index(sep, limits)
>except ValueError:
>return limits, range(0), range(0)
>endsep = idxsep + len(sep)
>return (range(limits.start, idxsep),
>range(idxsep, endsep),
>range(endsep, limits.stop))
>
> With partition() itself being equivalent to:
>
>  def partition(self, sep, subseq=None):
>  before, sep, after = self.partition_indices(sep, subseq)
>  return self[before], self[sep], self[after]
>
> Finally, an efficient partition based implementation of the example from
> Walter that started the whole discussion about views and the problem with
> excessive copying would look like:
>
> def splitpartition_indices(s):
>   rest = range(len(s))
>   while 1:
>   prefix, lbrace, rest = s.partition_indices("{", rest)
>   first, space, rest = s.partition_indices(" ", rest)
>   second, rbrace, rest = s.partition_indices("}", rest)
>   if prefix:
>   yield (None, s[prefix])
>   if not (lbrace and space and rbrace):
>   break
>   yield (s[first], s[second])
>
> (I know the above misses a micro-optimization, in that it calls partition
> again on an empty subsequence, even if space or lbrace are False. I believe
> doing the three partition calls together makes it much easier to read, and
> searching an empty string is pretty quick).
>
> For comparison, here's the normal copying version that has problems scaling to
> large strings:
>
> def splitpartition(s):
>   rest = s
>   while 1:
>   prefix, lbrace, rest = rest.partition_indices("{")
>   first, space, rest = rest.partition_indices(" ")
>   second, rbrace, rest = rest.partition_indices("}")
>   if prefix:
>   yield (None, prefix)
>   if not (lbrace and space and rbrace):
>   break
>   yield (first, second)
>
> Should I make 

Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Josiah Carlson

Nick Coghlan <[EMAIL PROTECTED]> wrote:
> 
> This idea is inspired by the find/rfind string discussion (particularly a 
> couple of comments from Jim and Ron), but I think the applicability may prove 
> to be wider than just string methods (e.g. I suspect it may prove useful for 
> the bytes() type as well).

A couple comments...

I don't particularly like the idea of using lists (or really iter(list) ),
range, or slice objects as defining what indices remain for a particular
string operation.  It just doesn't seem like the *right* thing to do.

> There are a couple of existing workarounds for this: buffer() objects, and 
> the 
> start/stop arguments to a variety of string methods. Neither of these is 
> particular convenient to work with, and buffer() is slated to go away in Py3k.

Ahh, but string views offer a significantly more reasonable mechanism.

string = stringview(string)

Now, you can do things like parition(), slicing (with step=1), etc., and
all can return further string views.  Users don't need to learn a new
semantic (pass the sequence of indices).  We can toss all of the
optional start, stop arguments to all string functions, and replace them
with either of the following:
result = stringview(string, start=None, stop=None).method(args)

string = stringview(string)
result = string[start:stop].method(args)


Perhaps one of the reasons why I prefer string views over this indices
mechanism is because I'm familliar with buffers, the idea of just having
a pointer into another structure, etc.  It just feels more natural from
my 8 years of C and 6 years of Python.


 - Josiah

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] find -> index patch

2006-08-26 Thread Jack Diederich
On Thu, Aug 24, 2006 at 03:48:57PM +0200, Fredrik Lundh wrote:
> Michael Chermside wrote:
> 
> >> WOW, I love partition.  In all the instances that weren't a simple "in"
> >> test I ended up using [r]partition.  In some cases one of the returned
> >> strings gets thrown away but in those cases it is guaranteed to be small.
> >> The new code is usually smaller than the old and generally clearer.
> >
> > Wow. That's just beautiful. This has now convinced me that dumping
> > [r]find() (at least!) and pushing people toward using partition will
> > result in pain in the short term (of course), and beautiful, readable
> > code in the long term.
> 
> note that partition provides an elegant solution to an important *subset* of 
> all
> problems addressed by find/index.
> 
> just like lexical scoping vs. default arguments and map vs. list 
> comprehensions,
> it doesn't address all problems right out of the box, and shouldn't be 
> advertised
> as doing that.
> 

After some benchmarking find() can't go away without really hurting readline() 
performance.  partition performs as well as find for small lines but for large 
lines the extra copy to concat the newline separator is a killer (twice as slow 
for 50k char lines).  index has the opposite problem as the overhead of setting 
up
a try block makes 50 char lines twice as slow even when the except clause is 
never 
triggered.

A version of partition that returned two arguments instead of three would solve
the problem but that would just be adding more functions to remove the two 
find's
or adding behavior flags to partition.  Ick.

Most uses of find are better off using partition but if this one case can't
be beat there must be others too.

-Jack
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Jim Jewett
On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> Nick Coghlan <[EMAIL PROTECTED]> wrote:

> > There are a couple of existing workarounds for
> > this: buffer() objects, and the start/stop arguments
> > to a variety of string methods. Neither of these is
> > particular convenient to work with, and buffer() is
> > slated to go away in Py3k.

> Ahh, but string views offer a significantly more
> reasonable mechanism.

As I understand it, Nick is suggesting that slice objects be used as a
sequence (not just string) view.


> string = stringview(string)
> ...  We can toss all of the optional start, stop
> arguments to all string functions, and replace them
> with either of the following:
> result = stringview(string, start=None, stop=None).method(args)

> string = stringview(string)
> result = string[start:stop].method(args)

Under Nick's proposal, I believe we could replace it with just the final line.

result = string[start:stop].method(args)

though there is a chance that (when you want to avoid copying) he is
suggesting explicit slice objects such as

view=slice(start, stop)
result = view(string).method(args)

-jJ
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py

2006-08-26 Thread Jim Jewett
In Py3K, is it still safe to assume that a list of paths will be
(enough like) ordinary strings?

I ask because of the various Path object discussions; it wasn't clear
that a Path object should be a sequence of (normalized unicode?)
characters (rather than path components), that the path would always
be normalized or absolute, or even that it would implement the LE (or
LT?) comparison operator.

-jJ

On 8/26/06, jack.diederich <[EMAIL PROTECTED]> wrote:
> Author: jack.diederich
> Date: Sat Aug 26 20:42:06 2006
> New Revision: 51624

> Added: python/trunk/Lib/genericpath.py

> +# Return the longest prefix of all list elements.
> +def commonprefix(m):
> +"Given a list of pathnames, returns the longest common leading component"
> +if not m: return ''
> +s1 = min(m)
> +s2 = max(m)
> +n = min(len(s1), len(s2))
> +for i in xrange(n):
> +if s1[i] != s2[i]:
> +return s1[:i]
> +return s1[:n]
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] find -> index patch

2006-08-26 Thread Guido van Rossum
On 8/26/06, Jack Diederich <[EMAIL PROTECTED]> wrote:
> After some benchmarking find() can't go away without really hurting readline()
> performance.

Can you elaborate? readline() is typically implemented in C so I'm not
sure I follow.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Guido van Rossum
On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > Nick Coghlan <[EMAIL PROTECTED]> wrote:
>
> > > There are a couple of existing workarounds for
> > > this: buffer() objects, and the start/stop arguments
> > > to a variety of string methods. Neither of these is
> > > particular convenient to work with, and buffer() is
> > > slated to go away in Py3k.
>
> > Ahh, but string views offer a significantly more
> > reasonable mechanism.
>
> As I understand it, Nick is suggesting that slice objects be used as a
> sequence (not just string) view.

I have a hard time parsing this sentence. A slice is an object with
three immutable attributes -- start, stop, step. How does this double
as a string view?

> > string = stringview(string)
> > ...  We can toss all of the optional start, stop
> > arguments to all string functions, and replace them
> > with either of the following:
> > result = stringview(string, start=None, stop=None).method(args)
>
> > string = stringview(string)
> > result = string[start:stop].method(args)
>
> Under Nick's proposal, I believe we could replace it with just the final line.

I still don't see the transformation of clumsy to elegant. Please give
me a complete, specific example instead of a generic code snippet.
(Also, please don't use 'string' as a variable name. There's a module
by that name that I can't get out of my head.)

Maybe the idea is that instead of

  pos = s.find(t, pos)

we would write

  pos += stringview(s)[pos:].find(t)

???

And how is that easier on the eyes? (And note the need to use +=
because the sliced view renumbers the positions in the original
string.)

> result = string[start:stop].method(args)
>
> though there is a chance that (when you want to avoid copying) he is
> suggesting explicit slice objects such as
>
> view=slice(start, stop)
> result = view(string).method(args)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] path in py3K Re: [Python-checkins] r51624 - in python/trunk/Lib: genericpath.py macpath.py ntpath.py os2emxpath.py posixpath.py test/test_genericpath.py

2006-08-26 Thread Guido van Rossum
It is not my intention to adopt the Path module in Py3k.

On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> In Py3K, is it still safe to assume that a list of paths will be
> (enough like) ordinary strings?
>
> I ask because of the various Path object discussions; it wasn't clear
> that a Path object should be a sequence of (normalized unicode?)
> characters (rather than path components), that the path would always
> be normalized or absolute, or even that it would implement the LE (or
> LT?) comparison operator.
>
> -jJ
>
> On 8/26/06, jack.diederich <[EMAIL PROTECTED]> wrote:
> > Author: jack.diederich
> > Date: Sat Aug 26 20:42:06 2006
> > New Revision: 51624
>
> > Added: python/trunk/Lib/genericpath.py
>
> > +# Return the longest prefix of all list elements.
> > +def commonprefix(m):
> > +"Given a list of pathnames, returns the longest common leading 
> > component"
> > +if not m: return ''
> > +s1 = min(m)
> > +s2 = max(m)
> > +n = min(len(s1), len(s2))
> > +for i in xrange(n):
> > +if s1[i] != s2[i]:
> > +return s1[:i]
> > +return s1[:n]
> ___
> Python-3000 mailing list
> Python-3000@python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Jim Jewett
On 8/26/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 8/26/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > On 8/26/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > > Nick Coghlan <[EMAIL PROTECTED]> wrote:

> > > > There are a couple of existing workarounds for
> > > > this: buffer() objects, and the start/stop
> > > > arguments to a variety of string methods.

> > > Ahh, but string views offer a significantly more
> > > reasonable mechanism.

> > As I understand it, Nick is suggesting that slice
> > objects be used as a sequence (not just string)
> > view.

> I have a hard time parsing this sentence. A slice is
> an object with three immutable attributes -- start,
> stop, step. How does this double as a string view?

Poor wording on my part; it is (the application of a slice to a
specific sequence) that could act as copyless view.

For example, you wanted to keep the rarely used optional arguments to
find because of efficiency.

s.find(prefix, start, stop)

does not copy.  If slices were less eager at copying, this could be
rewritten as

view=slice(start, stop, 1)
view(s).find(prefix)

or perhaps even as

s[start:stop].find(prefix)

I'm not sure these look better, but they are less surprising, because
they don't depend on optional arguments that most people have
forgotten about.


> Maybe the idea is that instead of

>   pos = s.find(t, pos)

> we would write

>   pos += stringview(s)[pos:].find(t)

> ???

With stringviews, you wouldn't need to be reindexing from the start of
the original string.  The idiom would instead be a generalization of
"for line in file:"

while data:
chunk, sep, data = data.partition()

but the partition call would not need to copy the entire string; it
could simply return three views.

Yes, this does risk keeping all of data alive because one chunk was
saved.  This might be a reasonable tradeoff to avoid the copying.  If
not, perhaps the gc system could be augmented to shrink bloated views
during idle moments.

-jJ
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] find -> index patch

2006-08-26 Thread Jack Diederich
On Sat, Aug 26, 2006 at 07:51:03PM -0700, Guido van Rossum wrote:
> On 8/26/06, Jack Diederich <[EMAIL PROTECTED]> wrote:
> > After some benchmarking find() can't go away without really hurting 
> > readline()
> > performance.
> 
> Can you elaborate? readline() is typically implemented in C so I'm not
> sure I follow.
> 

A number of modules in Lib have readline() methods that currently use find().
StringIO, httplib, tarfile, and others

sprat:~/src/python-head/Lib# grep 'def readline' *.py | wc -l
30

Mainly I wanted to point out that find() solves a class of problems that
can't be solved equally well with partition() (bad for large strings that
want to preserve the seperator) or index() (bad for large numbers of small 
strings and for frequent misses).  I wanted to reach the conclusion that 
find() could be yanked out but as Fredrik opined it is still useful for a 
subset of problems.

-Jack
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Making more effective use of slice objects in Py3k

2006-08-26 Thread Josiah Carlson

"Jim Jewett" <[EMAIL PROTECTED]> wrote:
> With stringviews, you wouldn't need to be reindexing from the start of
> the original string.  The idiom would instead be a generalization of
> "for line in file:"
> 
> while data:
> chunk, sep, data = data.partition()
> 
> but the partition call would not need to copy the entire string; it
> could simply return three views.

Also, with a little work, having string views be smart about
concatenation (if two views are adjacent to each other, like chunk,sep
or sep,data above, view1+view2 -> view3 on the original string), copies
could further be minimized, and the earlier problem with readline, etc.,
can be avoided.

 - Josiah

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com