Re: Simple Text Processing Help

2007-10-17 Thread Tim Roberts
[EMAIL PROTECTED] wrote: > >And now for something completely different... > >I've been reading up a bit about Python and Excel and I quickly told >the program to output to Excel quite easily. However, what if the >input file were a Word document? I can't seem to find much >information about parsi

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I've been reading up a bit about Python and Excel and I quickly told the program to output to Excel quite easily. However, what if the input file were a Word document? I can't seem to find much information about parsing Word files. What could I add

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I see a lot of COM stuff with Python for excel...and I quickly made the same program output to excel. What if the input file were a Word document? Where is there information about manipulating word documents, or what could I add to make the same prog

Re: Simple Text Processing Help

2007-10-16 Thread Peter Otten
patrick.waldo wrote: > manipulation? Also, I conceptually get it, but would you mind walking > me through >> for key, group in groupby(instream, unicode.isspace): >> if not key: >> yield "".join(group) itertools.groupby() splits a sequence into groups with the same key; e. g

Re: Simple Text Processing Help

2007-10-15 Thread Paul McGuire
On Oct 14, 8:48 am, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into another

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 10:08 pm, [EMAIL PROTECTED] wrote: > Because of my limited Python knowledge, I will need to try to figure > out exactly how they work for future text manipulation and for my own > knowledge. Could you recommend some resources for this kind of text > manipulation? Also, I conceptually g

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
Wow, thank you all. All three work. To output correctly I needed to add: output.write("\r\n") This is really a great help!! Because of my limited Python knowledge, I will need to try to figure out exactly how they work for future text manipulation and for my own knowledge. Could you recommend

Re: Simple Text Processing Help

2007-10-15 Thread Peter Otten
patrick.waldo wrote: > my sample input file looks like this( not organized,as you see it): > 200-720-769-93-2 > kyselina mocová C5H4N4O3 > > 200-001-8 50-00-0 > formaldehyd CH2O > > 200-002-3 > 50-01-1 > guanidínium-chlorid CH5N3.ClH Assuming that the records are al

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 12:20 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: > > my sample input file looks like this( not organized,as you see it): > > 200-720-769-93-2 > > kyselina mocová C5H4N4O3 > > > 200-001-8 50-00-0 >

Re: Simple Text Processing Help

2007-10-15 Thread Marc 'BlackJack' Rintsch
On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: > my sample input file looks like this( not organized,as you see it): > 200-720-769-93-2 > kyselina mocová C5H4N4O3 > > 200-001-8 50-00-0 > formaldehyd CH2O > > 200-002-3 > 50-01-1 > guanidínium-chlorid CH5N3.C

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
> lines = open('your_file.txt').readlines()[:4] > print lines > print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
> lines = open('your_file.txt').readlines()[:4] > print lines > print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-14 Thread John Machin
On Oct 14, 11:48 pm, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into anothe

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 16:57:06 +, patrick.waldo wrote: > Thank you both for helping me out. I am still rather new to Python > and so I'm probably trying to reinvent the wheel here. > > When I try to do Paul's response, I get tokens = line.strip().split() > [] What is in `line`? Paul wrot

Re: Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Thank you both for helping me out. I am still rather new to Python and so I'm probably trying to reinvent the wheel here. When I try to do Paul's response, I get >>>tokens = line.strip().split() [] So I am not quite sure how to read line by line. tokens = input.read().split() gets me all the in

Re: Simple Text Processing Help

2007-10-14 Thread Paul Hankin
On Oct 14, 2:48 pm, [EMAIL PROTECTED] wrote: > Hi all, > > I started Python just a little while ago and I am stuck on something > that is really simple, but I just can't figure out. > > Essentially I need to take a text document with some chemical > information in Czech and organize it into another

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 13:48:51 +, patrick.waldo wrote: > Essentially I need to take a text document with some chemical > information in Czech and organize it into another text file. The > information is always EINECS number, CAS, chemical name, and formula > in tables. I need to organize them

Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text file. The information is always EINECS number, CAS