Re: Fastest technique for string concatenation
On Oct 5, 2:39 pm, Will Hall wrote: > On Oct 3, 8:19 am, Roy Smith wrote: > > > > > My local news feed seems to have lost the early part of this thread, so > > I'm afraid I don't know who I'm quoting here: > > > > My understanding is that appending to a list and then joining > > > this list when done is the fastest technique for string > > > concatenation. Is this true? > > > > The 3 string concatenation techniques I can think of are: > > > > - append to list, join > > > - string 'addition' (s = s + char) > > > - cStringIO > > > There is a fourth technique, and that is to avoid concatenation in the > > first place. One possibility is to use the common append/join pattern > > mentioned above: > > > vector = [] > > while (stuff happens): > > vector.append(whatever) > > my_string = ''.join(vector) > > > But, it sometimes (often?) turns out that you don't really need > > my_string. It may just be a convenient way to pass the data on to the > > next processing step. If you can arrange your code so the next step can > > take the vector directly, you can avoid creating my_string at all. > > > For example, if all you're going to do is write the string out to a file > > or network socket, you could user vectored i/o, with something like > > python-writev (http://pypi.python.org/pypi/python-writev/1.1). If > > you're going to iterate over the string character by character, you > > could write an iterator which does that without the intermediate copy. > > Something along the lines of: > > > def each(self): > > for s in self.vector: > > for c in s: > > yield c > > > Depending on the amount of data you're dealing with, this could be a > > significant improvement over doing the join(). > > Okay. I've never responded to one of these before, so please correct > me if I'm making any large blunders. I'd just recently read Guido's > Python Patterns -- An Optimization Anecdote, and I was wondering why a > similar method to the one he suggests wouldn't work here? > > My suggestion: > def arrayConcat(): > output = array.array('c', source).tostring() > > Am I missing something, or will this work? > > Thanks, > Will My bad, I forgot the 'import array' statement at the top. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
On Oct 3, 8:19 am, Roy Smith wrote: > My local news feed seems to have lost the early part of this thread, so > I'm afraid I don't know who I'm quoting here: > > > My understanding is that appending to a list and then joining > > this list when done is the fastest technique for string > > concatenation. Is this true? > > > The 3 string concatenation techniques I can think of are: > > > - append to list, join > > - string 'addition' (s = s + char) > > - cStringIO > > There is a fourth technique, and that is to avoid concatenation in the > first place. One possibility is to use the common append/join pattern > mentioned above: > > vector = [] > while (stuff happens): > vector.append(whatever) > my_string = ''.join(vector) > > But, it sometimes (often?) turns out that you don't really need > my_string. It may just be a convenient way to pass the data on to the > next processing step. If you can arrange your code so the next step can > take the vector directly, you can avoid creating my_string at all. > > For example, if all you're going to do is write the string out to a file > or network socket, you could user vectored i/o, with something like > python-writev (http://pypi.python.org/pypi/python-writev/1.1). If > you're going to iterate over the string character by character, you > could write an iterator which does that without the intermediate copy. > Something along the lines of: > > def each(self): > for s in self.vector: > for c in s: > yield c > > Depending on the amount of data you're dealing with, this could be a > significant improvement over doing the join(). Okay. I've never responded to one of these before, so please correct me if I'm making any large blunders. I'd just recently read Guido's Python Patterns -- An Optimization Anecdote, and I was wondering why a similar method to the one he suggests wouldn't work here? My suggestion: def arrayConcat(): output = array.array('c', source).tostring() Am I missing something, or will this work? Thanks, Will -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
My local news feed seems to have lost the early part of this thread, so I'm afraid I don't know who I'm quoting here: > My understanding is that appending to a list and then joining > this list when done is the fastest technique for string > concatenation. Is this true? > > The 3 string concatenation techniques I can think of are: > > - append to list, join > - string 'addition' (s = s + char) > - cStringIO There is a fourth technique, and that is to avoid concatenation in the first place. One possibility is to use the common append/join pattern mentioned above: vector = [] while (stuff happens): vector.append(whatever) my_string = ''.join(vector) But, it sometimes (often?) turns out that you don't really need my_string. It may just be a convenient way to pass the data on to the next processing step. If you can arrange your code so the next step can take the vector directly, you can avoid creating my_string at all. For example, if all you're going to do is write the string out to a file or network socket, you could user vectored i/o, with something like python-writev (http://pypi.python.org/pypi/python-writev/1.1). If you're going to iterate over the string character by character, you could write an iterator which does that without the intermediate copy. Something along the lines of: def each(self): for s in self.vector: for c in s: yield c Depending on the amount of data you're dealing with, this could be a significant improvement over doing the join(). -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
pyt...@bdurham.com wrote: > My understanding is that appending to a list and then joining > this list when done is the fastest technique for string > concatenation. Is this true? > > The 3 string concatenation techniques I can think of are: > > - append to list, join > - string 'addition' (s = s + char) > - cStringIO > > The code that follows my signature confirms that the list > append/join technique is indeed the fastest of these 3 > approaches, but perhaps there are other techniques I should > consider? > > Malcolm > > # test various techniques for string concatenation > import cStringIO > import timeit > source = 'x' * 500 > def testListAppend(): > output = list() > for char in source: > output.append( char ) > output = ''.join( output ) You are measuring list creation here, not string concatenation: $ python -m timeit -s's = "x"*10**5' 'd = []' 'for c in s: d.append(c)' 10 loops, best of 3: 20.9 msec per loop $ python -m timeit -s's = ["x"]*10**5' '"".join(s)' 100 loops, best of 3: 2.47 msec per loop To fix that I'd start with a list, not a string: source = ["x"] * 500 > def testStringIO(): > output = cStringIO.StringIO() > for char in source: > output.write( char ) > output = output.getvalue() You can simplify that to def test_stringio(): output = cStringIO.StringIO() output.writelines(source) # doesn't add newlines output = output.getvalue() which is almost as fast as ''.join: $ python -m timeit -s's = ["x"]*10**5' '"".join(s)' 100 loops, best of 3: 2.47 msec per loop $ python -m timeit -s's = ["x"]*10**5' -s 'from cStringIO import StringIO' 'o = StringIO(); o.writelines(s); o.getvalue()' 100 loops, best of 3: 2.88 msec per loop Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
On Sat, 02 Oct 2010 13:17:02 -0700, Carey Tilden wrote: > Have you profiled an application and found string concatenation to be > a performance bottleneck? I would be surprised, but it's always > possible. If not, I'd suggest you choose the technique that is most > clear and concise, and worry about performance only if it becomes a > real issue. While I agree with your general point about premature optimization, I would not be the tiniest bit surprised to discover that string concatenation was a performance bottleneck -- and that it only shows up under some platforms and not others! Repeated string concatenation risks being an O(n**2) algorithm, which performs terribly. Last year there was a bug report of awful performance in the httplib module that only effected Windows users: http://www.mail-archive.com/python-dev%40python.org/msg40692.html httplib was FOUR HUNDRED times slower than IE at copying a file over a local network. Eventually the fault was tracked down to repeated string concatenation in the module: http://www.mail-archive.com/python-dev%40python.org/msg41150.html It costs nothing to avoid repeated string concatenation even prior to demonstrating a problem. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
Carey, > Have you profiled an application and found string concatenation to be a > performance bottleneck? I would be surprised, but it's always possible. The "application" is very simple - its essentially a finite state machine that parses complex RTF files. We read char by char and do lots of string concatenation as we move through our various states. Using lists for concatenation and Emile's technique of eliminating repeated attribute lookups, we have significantly improved the performance of our utilities. Malcolm -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
Emile, > Your times will improve when not burdened by the repeated method lookups and > element-wise list creation. Excellent point!! Here's updated timings for each technique followed by copy and paste source code for anyone that wants to fiddle with these tests. I've placed your name above your script suggestions. My results (Python 2.7/32-bit on Window 7/64-bit) testListAppend = 10.95 sec testStringConcat = 49.49 sec testStringIO = 14.68 sec testListAppend2 = 7.42 sec < fastest !!! testListAppend3 = 8.22 sec testStringIO2 = 10.91 sec Thanks for the feedback - really appreciated. Malcolm # test performance of various string concatenation techniques import cStringIO import timeit source = 'x' * 500 def testListAppend(): output = list() for char in source: output.append( char ) output = ''.join( output ) def testStringConcat(): output = '' for char in source: output += char def testStringIO(): output = cStringIO.StringIO() for char in source: output.write( char ) output = output.getvalue() # "Emile van Sebille" : # Your times will improve when not burdened by the repeated method lookups and element-wise list creation: def testListAppend2(): output = list() append = output.append for char in source: append( char ) output = ''.join( output ) def testListAppend3(): input = list( source ) output = list() append = output.append for char in input: append( char ) output = ''.join( output ) def testStringIO2(): output = cStringIO.StringIO() write = output.write for char in source: write( char ) output = output.getvalue() def time( func ): timingObject = timeit.Timer( func ) runtime = timingObject.timeit( 10 ) print '%s = %.2f sec' % ( func.__name__, runtime ) time( testListAppend ) time( testStringConcat ) time( testStringIO ) time( testListAppend2 ) time( testListAppend3 ) time( testStringIO2 ) -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
On Sat, Oct 2, 2010 at 12:09 PM, wrote: > My understanding is that appending to a list and then joining this list when > done is the fastest technique for string concatenation. Is this true? Have you profiled an application and found string concatenation to be a performance bottleneck? I would be surprised, but it's always possible. If not, I'd suggest you choose the technique that is most clear and concise, and worry about performance only if it becomes a real issue. Carey -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest technique for string concatenation
On 10/2/2010 12:09 PM pyt...@bdurham.com said... Your times will improve when not burdened by the repeated method lookups and element-wise list creation: try with eg, def testListAppend2(): output = list() append = output.append for char in source: append( char ) output = ''.join( output ) and even further with def testListAppend3(): output = list(source) output = ''.join( output ) Same with: def testStringIO2(): output = cStringIO.StringIO() write = output.write for char in source: write( char ) output = output.getvalue() Emile -- http://mail.python.org/mailman/listinfo/python-list