On Jul 16, 3:12 pm, seldan24 <selda...@gmail.com> wrote: > On Jul 15, 1:48 pm, Emile van Sebille <em...@fenx.com> wrote: > > > > > > > On 7/15/2009 10:23 AM MRAB said... > > > >> On Jul 15, 12:47 pm, Michiel Overtoom <mot...@xs4all.nl> wrote: > > >>> seldan24 wrote: > > >>>> what can I use as the equivalent for the Unix 'fold' command? > > >>> def fold(s,len): > > >>> while s: > > >>> print s[:len] > > >>> s=s[len:] > > > <snip> > > > You might still need to tweak the above code as regards how line endings > > > are handled. > > > You might also want to tweak it if the strings are _really_ long to > > simply slice out the substrings as opposed to reassigning the balance to > > a newly created s on each iteration. > > > Emile > > Thanks for all of the help. I'm almost there. I have it working now, > but the 'fold' piece is very slow. When I use the 'fold' command in > shell it is almost instantaneous. I was able to do the EBCDIC->ASCII > conversion usng the decode method in the built-in str type. I didn't > have to import the codecs module. I just decoded the data to cp037 > which works fine. > > So now, I'm left with a large file, consisting of one extremely long > line of ASCII data that needs to be sliced up into 35 character > lines. I did the following, which works but takes a very long time: > > f = open(ascii_file, 'w') > while ascii_data: > f.write(ascii_data[:len]) > ascii_data = ascii_data[len:] > f.close() > > I know that Emile suggested that I can slice out the substrings rather > than do the gradual trimming of the string variable as is being done > by moving around the length. So, I'm going to give that a try... I'm > a bit confused by what that means, am guessing that slice can break up > a string based on characters; will research. Thanks for the help thus > far. I'll post again when all is working fine.
Assuming your rather large text file is 1 meg long, you have 1 million characters in there. 1000000/35 = ~29k lines. The size remaining string decreases linearly, so the average size is (1000000 + 0) / 2 or 500k. All said and done, you're allocating and copying a 500K string -- not once, but 29 thousand times. That's where your slowdown resides. -- http://mail.python.org/mailman/listinfo/python-list