Re: Fastest technique for string concatenation

2010-10-05 Thread Will Hall
On Oct 5, 2:39 pm, Will Hall  wrote:
> On Oct 3, 8:19 am, Roy Smith  wrote:
>
>
>
> > My local news feed seems to have lost the early part of this thread, so
> > I'm afraid I don't know who I'm quoting here:
>
> > > My understanding is that appending to a list and then joining
> > > this list when done is the fastest technique for string
> > > concatenation. Is this true?
>
> > > The 3 string concatenation techniques I can think of are:
>
> > > - append to list, join
> > > - string 'addition' (s = s + char)
> > > - cStringIO
>
> > There is a fourth technique, and that is to avoid concatenation in the
> > first place.   One possibility is to use the common append/join pattern
> > mentioned above:
>
> > vector = []
> > while (stuff happens):
> >    vector.append(whatever)
> > my_string = ''.join(vector)
>
> > But, it sometimes (often?) turns out that you don't really need
> > my_string.  It may just be a convenient way to pass the data on to the
> > next processing step.  If you can arrange your code so the next step can
> > take the vector directly, you can avoid creating my_string at all.
>
> > For example, if all you're going to do is write the string out to a file
> > or network socket, you could user vectored i/o, with something like
> > python-writev (http://pypi.python.org/pypi/python-writev/1.1).  If
> > you're going to iterate over the string character by character, you
> > could write an iterator which does that without the intermediate copy.  
> > Something along the lines of:
>
> >     def each(self):
> >         for s in self.vector:
> >             for c in s:
> >                 yield c
>
> > Depending on the amount of data you're dealing with, this could be a
> > significant improvement over doing the join().
>
> Okay. I've never responded to one of these before, so please correct
> me if I'm making any large blunders.  I'd just recently read Guido's
> Python Patterns -- An Optimization Anecdote, and I was wondering why a
> similar method to the one he suggests wouldn't work here?
>
> My suggestion:
> def arrayConcat():
>     output = array.array('c', source).tostring()
>
> Am I missing something, or will this work?
>
> Thanks,
> Will

My bad, I forgot the 'import array' statement at the top.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-05 Thread Will Hall
On Oct 3, 8:19 am, Roy Smith  wrote:
> My local news feed seems to have lost the early part of this thread, so
> I'm afraid I don't know who I'm quoting here:
>
> > My understanding is that appending to a list and then joining
> > this list when done is the fastest technique for string
> > concatenation. Is this true?
>
> > The 3 string concatenation techniques I can think of are:
>
> > - append to list, join
> > - string 'addition' (s = s + char)
> > - cStringIO
>
> There is a fourth technique, and that is to avoid concatenation in the
> first place.   One possibility is to use the common append/join pattern
> mentioned above:
>
> vector = []
> while (stuff happens):
>    vector.append(whatever)
> my_string = ''.join(vector)
>
> But, it sometimes (often?) turns out that you don't really need
> my_string.  It may just be a convenient way to pass the data on to the
> next processing step.  If you can arrange your code so the next step can
> take the vector directly, you can avoid creating my_string at all.
>
> For example, if all you're going to do is write the string out to a file
> or network socket, you could user vectored i/o, with something like
> python-writev (http://pypi.python.org/pypi/python-writev/1.1).  If
> you're going to iterate over the string character by character, you
> could write an iterator which does that without the intermediate copy.  
> Something along the lines of:
>
>     def each(self):
>         for s in self.vector:
>             for c in s:
>                 yield c
>
> Depending on the amount of data you're dealing with, this could be a
> significant improvement over doing the join().

Okay. I've never responded to one of these before, so please correct
me if I'm making any large blunders.  I'd just recently read Guido's
Python Patterns -- An Optimization Anecdote, and I was wondering why a
similar method to the one he suggests wouldn't work here?

My suggestion:
def arrayConcat():
output = array.array('c', source).tostring()

Am I missing something, or will this work?

Thanks,
Will
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-03 Thread Roy Smith
My local news feed seems to have lost the early part of this thread, so 
I'm afraid I don't know who I'm quoting here:

> My understanding is that appending to a list and then joining
> this list when done is the fastest technique for string
> concatenation. Is this true?
> 
> The 3 string concatenation techniques I can think of are:
> 
> - append to list, join
> - string 'addition' (s = s + char)
> - cStringIO

There is a fourth technique, and that is to avoid concatenation in the 
first place.   One possibility is to use the common append/join pattern 
mentioned above:

vector = []
while (stuff happens):
   vector.append(whatever)
my_string = ''.join(vector)

But, it sometimes (often?) turns out that you don't really need 
my_string.  It may just be a convenient way to pass the data on to the 
next processing step.  If you can arrange your code so the next step can 
take the vector directly, you can avoid creating my_string at all.

For example, if all you're going to do is write the string out to a file 
or network socket, you could user vectored i/o, with something like 
python-writev (http://pypi.python.org/pypi/python-writev/1.1).  If 
you're going to iterate over the string character by character, you 
could write an iterator which does that without the intermediate copy.  
Something along the lines of:

def each(self):
for s in self.vector:
for c in s:
yield c

Depending on the amount of data you're dealing with, this could be a 
significant improvement over doing the join().
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-03 Thread Peter Otten
pyt...@bdurham.com wrote:

> My understanding is that appending to a list and then joining
> this list when done is the fastest technique for string
> concatenation. Is this true?
> 
> The 3 string concatenation techniques I can think of are:
> 
> - append to list, join
> - string 'addition' (s = s + char)
> - cStringIO
> 
> The code that follows my signature confirms that the list
> append/join technique is indeed the fastest of these 3
> approaches, but perhaps there are other techniques I should
> consider?
> 
> Malcolm
> 
> # test various techniques for string concatenation
> import cStringIO
> import timeit
> source = 'x' * 500
> def testListAppend():
> output = list()
> for char in source:
> output.append( char )
> output = ''.join( output )

You are measuring list creation here, not string concatenation:

$ python -m timeit -s's = "x"*10**5' 'd = []' 'for c in s: d.append(c)'
10 loops, best of 3: 20.9 msec per loop
$ python -m timeit -s's = ["x"]*10**5' '"".join(s)'
100 loops, best of 3: 2.47 msec per loop

To fix that I'd start with a list, not a string:

source = ["x"] * 500

> def testStringIO():
> output = cStringIO.StringIO()
> for char in source:
> output.write( char )
> output = output.getvalue()

You can simplify that to

def test_stringio():
output = cStringIO.StringIO()
output.writelines(source) # doesn't add newlines
output = output.getvalue()

which is almost as fast as ''.join:

$ python -m timeit -s's = ["x"]*10**5' '"".join(s)'
100 loops, best of 3: 2.47 msec per loop

$ python -m timeit -s's = ["x"]*10**5' -s 'from cStringIO import StringIO' 
'o = StringIO(); o.writelines(s); o.getvalue()'
100 loops, best of 3: 2.88 msec per loop

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-02 Thread Steven D'Aprano
On Sat, 02 Oct 2010 13:17:02 -0700, Carey Tilden wrote:

> Have you profiled an application and found string concatenation to be
> a performance bottleneck?  I would be surprised, but it's always
> possible.  If not, I'd suggest you choose the technique that is most
> clear and concise, and worry about performance only if it becomes a
> real issue.

While I agree with your general point about premature optimization, I 
would not be the tiniest bit surprised to discover that string 
concatenation was a performance bottleneck -- and that it only shows up 
under some platforms and not others!

Repeated string concatenation risks being an O(n**2) algorithm, which 
performs terribly. Last year there was a bug report of awful performance 
in the httplib module that only effected Windows users:

http://www.mail-archive.com/python-dev%40python.org/msg40692.html

httplib was FOUR HUNDRED times slower than IE at copying a file over a 
local network. Eventually the fault was tracked down to repeated string 
concatenation in the module:

http://www.mail-archive.com/python-dev%40python.org/msg41150.html

It costs nothing to avoid repeated string concatenation even prior to 
demonstrating a problem.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-02 Thread python
Carey,

> Have you profiled an application and found string concatenation to be a 
> performance bottleneck? I would be surprised, but it's always possible. 

The "application" is very simple - its essentially a finite state
machine that parses complex RTF files. We read char by char and do lots
of string concatenation as we move through our various states. Using
lists for concatenation and Emile's technique of eliminating repeated
attribute lookups, we have significantly improved the performance of our
utilities.

Malcolm
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-02 Thread python
Emile,

> Your times will improve when not burdened by the repeated method lookups and 
> element-wise list creation.

Excellent point!!

Here's updated timings for each technique followed by copy and paste
source code for anyone that wants to fiddle with these tests. I've
placed your name above your script suggestions.

My results (Python 2.7/32-bit on Window 7/64-bit)

testListAppend = 10.95 sec
testStringConcat = 49.49 sec
testStringIO = 14.68 sec
testListAppend2 = 7.42 sec < fastest !!!
testListAppend3 = 8.22 sec
testStringIO2 = 10.91 sec

Thanks for the feedback - really appreciated.

Malcolm


# test performance of various string concatenation techniques

import cStringIO
import timeit

source = 'x' * 500

def testListAppend():
output = list()
for char in source:
output.append( char )
output = ''.join( output )

def testStringConcat():
output = ''
for char in source:
output += char

def testStringIO(): 
output = cStringIO.StringIO()
for char in source:
output.write( char )
output = output.getvalue()


# "Emile van Sebille" :
# Your times will improve when not burdened by the repeated method
lookups and element-wise list creation:

def testListAppend2():
 output = list()
 append = output.append
 for char in source:
 append( char )
 output = ''.join( output )

def testListAppend3():
 input = list( source )
 output = list()
 append = output.append
 for char in input:
 append( char )
 output = ''.join( output )

def testStringIO2():
 output = cStringIO.StringIO()
 write = output.write
 for char in source:
 write( char )
 output = output.getvalue()

def time( func ):
timingObject = timeit.Timer( func )
runtime = timingObject.timeit( 10 )
print '%s = %.2f sec' % ( func.__name__, runtime )

time( testListAppend )
time( testStringConcat )
time( testStringIO )
time( testListAppend2 )
time( testListAppend3 )
time( testStringIO2 )
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-02 Thread Carey Tilden
On Sat, Oct 2, 2010 at 12:09 PM,  wrote:

> My understanding is that appending to a list and then joining this list when 
> done is the fastest technique for string concatenation. Is this true?

Have you profiled an application and found string concatenation to be
a performance bottleneck?  I would be surprised, but it's always
possible.  If not, I'd suggest you choose the technique that is most
clear and concise, and worry about performance only if it becomes a
real issue.

Carey
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fastest technique for string concatenation

2010-10-02 Thread Emile van Sebille

On 10/2/2010 12:09 PM pyt...@bdurham.com said...

Your times will improve when not burdened by the repeated method lookups 
and element-wise list creation:


try with eg,

def testListAppend2():
output = list()
append = output.append
for char in source:
append( char )
output = ''.join( output )

and even further with

def testListAppend3():
output = list(source)
output = ''.join( output )

Same with:

def testStringIO2():
output = cStringIO.StringIO()
write = output.write
for char in source:
write( char )
output = output.getvalue()

Emile


--
http://mail.python.org/mailman/listinfo/python-list