Re: [Tutor] Tutor Digest, Vol 38, Issue 10

2007-04-06 Thread Jay Mutter III
>
>
> "Jay Mutter III" <[EMAIL PROTECTED]> wrote
>
>
>> Whether I attempt to just strip the string or attempt to
>>
>> if line.endswith('No.\r'):
>> line = line.rstrip()
>>
>> It doesn't work.
>
> Can you try printing the string repr just before the test.
> Or even the last 6 characters:
>
> print repr(line[-6:])
> if line.endswith('No: \n')
>line = line.strip()
>

Alan using your suggestion with the code aove here is the print out:

jay-mutter-iiis-computer:~/documents/ToBePrinted jlm1$ python test.py
'andal\r'
'  No.\r'
' Dor-\r'
'  14;\r'
'315 ;\r'
'  No.\r'
'utton\r'
'H'

Which appears to me to have 2 lines ending with No. where

  the LF should be removed and the next line would be on the same line

Again thanks for the help/suggestions

> See if that helps narrow down the cause...
>
>> This is an imac running python 2.3.5 under OS-X 10.4.9
>
> Shouldn't make any odds.
>
> Weird,
>
> Alan G.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Tutor Digest, Vol 38, Issue 2

2007-04-05 Thread Jay Mutter III
>
>
> Message: 3
> Date: Sun, 1 Apr 2007 16:42:56 +0100
> From: "Alan Gauld" <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] Tutor Digest, Vol 38, Issue 1
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>   reply-type=original
>
>
> "Rikard Bosnjakovic" <[EMAIL PROTECTED]> wrote
>
>>>>> s1 = "some line\n"
>>>>> s2 = "some line"
>>>>> s1.endswith("line"), s2.endswith("line")
>> (False, True)
>>
>> Just skip the if and simply rstrip the string.
>

see below

> Or add \n to the endswith() test string if you really only
> want to strip the newline in those cases
>
> Alan G.
>
>
>
> --
>
> Message: 4
> Date: Sun, 1 Apr 2007 16:46:05 +0100
> From: "Alan Gauld" <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] Tutor Digest, Vol 38, Issue 1
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>   reply-type=original
>
> "Jay Mutter III" <[EMAIL PROTECTED]> wrote
>
>> inp = open('test.txt','r')
>> s = inp.readlines()
>> for line in s:
>> if line.endswith('No.'):
>> line = line.rstrip()
>> print line
>
> BTW,
> You do know that you can shorten that considerably?
> With:
>
> for line in open('test.txt'):
>if line.endswith('No.\n'):
>   line = line.rstrip()
>print line
>

Whether I attempt to just strip the string or attempt to

if line.endswith('No.\r'):
 line = line.rstrip()

It doesn't work.
Note - I tried \n, \r and \n\r although text wrangler claims that it  
does have unix line endings
When I used tr to do a few things \n or \r worked fine
I tried sed and it didn't work but from the command line in sed using  
ctrl-v and ctrl-j to insert the  line feed it worked
although i then could not figure out how to do the same in a script.
It is as if the python interpreter doesn't recognize the escaped n  
(or r) as a line feed.
This is an imac running python 2.3.5 under OS-X 10.4.9

Thanks again

> -- 
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
> --
>
> Message: 5
> Date: 01 Apr 2007 12:17:00 -0400
> From: "Greg Perry" <[EMAIL PROTECTED]>
> Subject: [Tutor] Communication between classes
> To: 
> Message-ID: <[EMAIL PROTECTED]>
>
> Hi again,
>
> I am still in the process of learning OOP concepts and reasons why  
> classes should be used instead of functions etc.
>
> One thing that is not apparent to me is the best way for classes to  
> communicate with each other.  For example, I have created an Args  
> class that sets a variety of internal variables (__filename,  
> __outputdir etc) by parsing the argv array from th command line.   
> What would be the preferred mechanism for returning or passing  
> along those variables to another class?  Maybe by a function method  
> that returns all of those variables?
>
>
>
>
>
> --
>
> Message: 6
> Date: Sun, 01 Apr 2007 20:46:21 +0200
> From: Andrei <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] Communication between classes
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Greg,
>
> Greg Perry wrote:
>> I am still in the process of learning OOP concepts and
>> reasons why classes should be used instead of functions etc.
>>
>> One thing that is not apparent to me is the best way for
>> classes to communicate with each other.  For example,
>
> Good question. Unfortunately there's no general rule that you can  
> apply
> and end up with an undisputably perfect solution.
>
> Classes should communicate on a need-to-know basis. Take for example a
> RSS feed reader application. You may have a class representing a feed
> and a class representing a post. The feed will know what posts it
> contains, but the post probably won't know what feed it comes from.  
> The
> interface would display a list of feeds (without knowing their
> contents), a list of posts within a feed (this needs to know both feed
> and feed contents) and the contents of a single post (knows only about
> an individual post).
>
>> I have created an Args class that sets a variety of internal
>> variables (__filename, __outputd

Re: [Tutor] Tutor Digest, Vol 38, Issue 1

2007-04-01 Thread Jay Mutter III
Alan thanks for the response;


> Message: 8
> Date: Sun, 1 Apr 2007 08:54:02 +0100
> From: "Alan Gauld" <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] Another parsing question
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>   reply-type=original
>
>
> "Jay Mutter III" <[EMAIL PROTECTED]> wrote
>
>> for line in s:
>> jay = patno.findall(line)
>> jay2 = "".join(jay[0])
>> print jay2
>>
>> and it prints fine up until line 111 which is a line that had
>> previously returned [ ] since a number didn't exist on that line and
>> then exits with
>
>> IndexError: list index out of range
>
> Either try/catch the exception or add an
> if not line: continue  # or return a default string
>
>> And as long as i am writing, how can I delete a return at the end of
>> a line if the line ends in a certain pattern?
>>
>> For instance, if line ends with the abbreviation  No.
>
> if line.endswith(string): line = line.rstrip()
>

For some reason this never works for me;
i am using an intel imac with OS X 10.4.9 which has python 2.3.5

inp = open('test.txt','r')
s = inp.readlines()
for line in s:
 if line.endswith('No.'):
 line = line.rstrip()
 print line
and it never ever removes the line feed.  (These are unix \r  
according to Text wrangler)
I am beginning to think that it is a problem with readlines.

But then i thought well why not

inp = open('test.txt','r')
s = inp.readlines()
for line in s:
 if line.endswith('No.'):
 line += s.next()
 print line,

however that doesn't work either which leads me to believe that it is  
me and my interpretation of the above.

Thanks

Jay



>> I want to join the current line with next line.
>> Are lists immutable or can they be changed?
>
> lists can be changed, tuples cannot.
>
> HTH,
>
> -- 
> Alan Gauld
> Author of the Learn to Program web site
> http://www.freenetpages.co.uk/hp/alan.gauld
>
>
>
>
> --
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> End of Tutor Digest, Vol 38, Issue 1
> 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Another parsing question

2007-03-31 Thread Jay Mutter III
Kent;
Again thanks for the help.
i am not sure if this is what you menat but i put

for line in s:
 jay = patno.findall(line)
 jay2 = "".join(jay[0])
 print jay2

and it prints fine up until line 111 which is a line that had  
previously returned [ ] since a number didn't exist on that line and   
then exits with

Traceback (most recent call last):
   File "patentno2.py", line 12, in ?
 jay2 = "".join(jay[0])
IndexError: list index out of range


And as long as i am writing, how can I delete a return at the end of  
a line if the line ends in a certain pattern?

For instance, if line ends with the abbreviation  No.
I want to join the current line with next line.
Are lists immutable or can they be changed?

Thanks again

jay

On Mar 31, 2007, at 2:27 PM, Kent Johnson wrote:

> Jay Mutter III wrote:
>> I have the following that I am using to extract "numbers' from a file
>> ...
>> which yields the following
>> [('1', '337', '912')]
> > ...
>> So what do i have above ? A list of tuples?
>
> Yes, each line is a list containing one tuple containing three  
> string values.
>
>> How do I  send the output to a file?
>
> When you print, the values are automatically converted to strings  
> by calling str() on them. When you use p2.write(), this conversion  
> is not automatic, you have to do it yourself via
>   p2.write(str(jay))
>
> You can also tell the print statement to output to a file like this:
>   print >>p2, jay
>
>> Is there a way to get the output as
>> 1337912  instead of   [('1', '337', '912')]  ?
>
> In [4]: jay=[('1', '337', '912')]
>
> jay[0] is the tuple alone:
> In [6]: jay[0]
> Out[6]: ('1', '337', '912')
>
> Join the elements together using an empty string as the separator:
> In [5]: ''.join(jay[0])
> Out[5]: '1337912'
> In [7]:
>
> Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Another parsing question

2007-03-31 Thread Jay Mutter III
Ok after a minute of thought I did solve my second question by simply  
changing my RE to

(r'(\d{1}[\s,.]+\d{3}[\s,.]+\d{3})')

but still haven't gotten he first one.


On Mar 31, 2007, at 1:39 PM, Jay Mutter III wrote:

> I have the following that I am using to extract "numbers' from a file
>
>
> prompt1 = raw_input('What is the file from which you would like a  
> list of patent numbers?  ')
> p1 = open(prompt1,'rU')
> s = p1.readlines()
> prompt2 = raw_input('What is the name of the file to which you  
> would like to save the list of patent numbers?  ')
> p2 = open(prompt2,'aU')
> patno = re.compile(r'(\d{1})[\s,.]+(\d{3})[\s,.]+(\d{3})')
> for line in s:
> jay = patno.findall(line)
> print jay
>
> which yields the following
>
> [('1', '337', '912')]
> [('1', '354', '756')]
> [('1', '360', '297')]
> [('1', '328', '232')]
> [('1', '330', '123')]
> [('1', '362', '944')]
> [('1', '350', '461')]
> [('1', '355', '991')]
> [('1', '349', '385')]
> [('1', '350', '521')]
> [('1', '336', '542')]
> [('1', '354', '922')]
> [('1', '338', '268')]
> [('1', '353', '682')]
> [('1', '343', '241')]
> [('1', '359', '852')]
> [('1', '342', '483')]
> [('1', '347', '068')]
> [('1', '331', '450')]
>
> if i try to write to a file instead of print to the screen using
> p2.write(jay)
> i get the message
>
> Traceback (most recent call last):
>   File "patentno.py", line 12, in ?
> p2.write(jay)
> TypeError: argument 1 must be string or read-only character buffer,  
> not list
>
> I f I try writelines i get
>
> Traceback (most recent call last):
>   File "patentno.py", line 12, in ?
> p2.writelines(jay)
> TypeError: writelines() argument must be a sequence of strings
> jay-mutter-iiis-computer:~/documents/programming/python/patents jlm1$
>
>
> So what do i have above ? A list of tuples?
>
> How do I  send the output to a file?
> Is there a way to get the output as
>
> 1337912  instead of   [('1', '337', '912')]  ?
>
> And as always thanks in advance for the help.
>
> jay Mutter
>

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Another parsing question

2007-03-31 Thread Jay Mutter III
I have the following that I am using to extract "numbers' from a file


prompt1 = raw_input('What is the file from which you would like a  
list of patent numbers?  ')
p1 = open(prompt1,'rU')
s = p1.readlines()
prompt2 = raw_input('What is the name of the file to which you would  
like to save the list of patent numbers?  ')
p2 = open(prompt2,'aU')
patno = re.compile(r'(\d{1})[\s,.]+(\d{3})[\s,.]+(\d{3})')
for line in s:
 jay = patno.findall(line)
 print jay

which yields the following

[('1', '337', '912')]
[('1', '354', '756')]
[('1', '360', '297')]
[('1', '328', '232')]
[('1', '330', '123')]
[('1', '362', '944')]
[('1', '350', '461')]
[('1', '355', '991')]
[('1', '349', '385')]
[('1', '350', '521')]
[('1', '336', '542')]
[('1', '354', '922')]
[('1', '338', '268')]
[('1', '353', '682')]
[('1', '343', '241')]
[('1', '359', '852')]
[('1', '342', '483')]
[('1', '347', '068')]
[('1', '331', '450')]

if i try to write to a file instead of print to the screen using
p2.write(jay)
i get the message

Traceback (most recent call last):
   File "patentno.py", line 12, in ?
 p2.write(jay)
TypeError: argument 1 must be string or read-only character buffer,  
not list

I f I try writelines i get

Traceback (most recent call last):
   File "patentno.py", line 12, in ?
 p2.writelines(jay)
TypeError: writelines() argument must be a sequence of strings
jay-mutter-iiis-computer:~/documents/programming/python/patents jlm1$


So what do i have above ? A list of tuples?

How do I  send the output to a file?
Is there a way to get the output as

1337912  instead of   [('1', '337', '912')]  ?

And as always thanks in advance for the help.

jay Mutter

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Tutor Digest, Vol 37, Issue 63

2007-03-25 Thread Jay Mutter III
>
> Message: 1
> Date: Sat, 24 Mar 2007 16:41:22 -0700 (PDT)
> From: Jaggo <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] Tutor Digest, Vol 37, Issue 62
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Message: 2
> Date: Sat, 24 Mar 2007 19:25:10 -0400
> From: Jay Mutter III
> Subject: [Tutor] parsing text
> [...]
> 1.) when i do readlines and create a list and then print the list it
> adds a blank line between every line of text
> [...]
> ideas?
>
> Thanks again
>
> jay
> Well,
> regarding your first question:
> "print string" automatically breaks a line at the end of string.  
> Use "print string," instead [note that trailin' , .]
>

yes, thank you for that


> [I'm not sure about your n. 2, that's why no answer is included.
>
>
> -
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> -- next part --
> An HTML attachment was scrubbed...
> URL: http://mail.python.org/pipermail/tutor/attachments/ 
> 20070324/2d731ac8/attachment-0001.htm
>
> --
>
> Message: 2
> Date: Sun, 25 Mar 2007 00:00:29 -
> From: "Alan Gauld" <[EMAIL PROTECTED]>
> Subject: Re: [Tutor] parsing text
> To: tutor@python.org
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>   reply-type=original
>
> "Jay Mutter III" <[EMAIL PROTECTED]> wrote
>
>> i have the following text:
>>
>> Barnett, John B., assignor of one-half to R. N. Tutt, Kansas City,
>> Mo.Automatic display-sign.No. 1,330 411-Apr. 13 ; v. 273 ;
>> p.
>> 193.
>> Barnett,  John  II..  Tettenhall,  England. Seat  of
>> motorcars.No. 1.353,708; Sept. 21 ; v. 278; p. 487. Barnett,
>> Otto
>> R.(See Scott, John M., assignor.)
>>
>> 1.) when i do readlines and create a list and then print the list it
>> adds a blank line between every line of text
>
> I suspect that's because you are reading a newline character
> from the file and print adds a newline of its own. You need to
> use rstrip() to take out the newline from the file.
>
>> 2.)in the second line after p.487 there is the beginning of a new
>> line of data only it isn't on a newline.
>
> I'm not quite sure what you mean here.
> It would be helpful if you can show us the problematic output
> as well as the input. Also to send us the actual code fragments
> that are causing the damage.

Yes after i received the reply i realized that i was not very clear.
i have a text file of inventors which should have one inventor on  
each line in alphabetical order but of course the lines
do not break at the end of 'p. xxx.'  (where p.xxx is the relevant  
page number)

I read the data in as a string figuring that I could then replace p.  
xxx with a carriage return, somehow write the data out to a text file
and the problem would be solved. Not quite so simple given my limited  
skill set.
The following is what I put in (interactively) and what I got out.

 >>> ss = open('inp.txt')
 >>> s = ss.read()
 >>> s.replace('p. ','\n')
'Barnett, John B., assignor of one-half to R. N. Tutt, Kansas City,  
Mo. Automatic display-sign.\xc2\xa0 \xc2\xa0 No. 1,330 411-Apr. 13 ;  
v. 273 ;\xc2\xa0\n\n193. Barnett,\xc2\xa0 John\xc2\xa0 II..\xc2\xa0  
Tettenhall,\xc2\xa0 England. \xc2\xa0 \xc2\xa0 Seat\xc2\xa0 of 
\nmotorcars.\xc2\xa0 \xc2\xa0 No. 1.353,708; Sept. 21 ; v. 278;  
\n487. Barnett,\xc2\xa0\nOtto R.\xc2\xa0 \xc2\xa0 (See Scott, John  
M., assignor.)'
 >>>

I though about treating it as a list of lines, stripping carriage  
returns on the basis of some criteria but i have never gotten rstrip  
to work



>
>> i tried string.replace(s,'p.','\n') in an attempt to put a CR in but
>> it just put the characters\n in the string.
>
> Dont use the string module functions. Use the string methods,
> so it becomes:
>
> s.replace('p.', '\n')
>
> However that doesn't explain why you are getting the literal
> characters! Can you send us the actual code you are using?
> And the output showing the error?
>
> HTH,
>
> Alan G.
>
>
>
>
> --
>
> Message: 3
> Date: Sat, 24 Mar 2007 19:32:36 -0500
> From: "Cecilia Alm" <[EMAIL PROTECTED]>
> Subject: [Tutor] No need to seed random?
> To: tutor@python.org
> Message-ID:
>   <[EMAIL PROTECTED]>
> Content-Type: tex

[Tutor] parsing text

2007-03-24 Thread Jay Mutter III
Kent thanks for this as I was clearly confused with regards to string  
and list of strings.
I am, however, still having difficulty with how to solve a problem  
involving a related issue.


i have the following text:

Barnett, John B., assignor of one-half to R. N. Tutt, Kansas City,  
Mo.Automatic display-sign.No. 1,330 411-Apr. 13 ; v. 273 ; p.  
193.
Barnett,  John  II..  Tettenhall,  England. Seat  of   
motorcars.No. 1.353,708; Sept. 21 ; v. 278; p. 487. Barnett, Otto  
R.(See Scott, John M., assignor.)

Barnett. Otto R. (See Sponenburg, Hiram H., assignor)
Barnett, William A., Lincoln. Nebr.Attachment for garment- 
turning   machines. No.   1,342,937;   June   8 ?   v 270 ; p. 313."
Barnhart, Clarence D., Brooklyn, assignor to W. S. Rockwell Company,  
New York. N. Y.Conveyer for furnaces No. 1.333.371 ; Mar. 9 ; v.  
272 ; p. 278.
Barnhart, Clarence v., Waynesboro, Pa., assignor to J. K. Hoffman and  
W. M. Raeclitel.  Hagerstowu, Md. Seed-planter.No. 1,357.43S:  
Nov. 2; v. 280: p. 45.

Barnhart, John E.(See Haves, J. P.. and Barnhart )
Barnhart,-Mollie E.(See Freeman. Alpheus J., assignor) Barnhill,  
E. B., and J. Stone, Indianapolis, Ind.Auto-tire 477513


1.) when i do readlines and create a list and then print the list it  
adds a blank line between every line of text
2.)in the second line after p.487 there is the beginning of a new  
line of data only it isn't on a newline.
i tried string.replace(s,'p.','\n') in an attempt to put a CR in but  
it just put the characters\n in the string.


ideas?

Thanks again

jay



Jay Mutter III wrote:
> Thanks for the response
> Actually the number of lines this returns is the same number of lines
> given when i put it in a text editor (TextWrangler).
> Luke had mentioned the same thing earlier but when I do change  
read to

> readlines  i get the following
>
>
> Traceback (most recent call last):
>   File "extract_companies.py", line 17, in ?
> count = len(text.splitlines())
> AttributeError: 'list' object has no attribute 'splitlines'

I think maybe you are confused about the difference between "all the
text of a file in a single string" and "all the lines of a file in a
list of strings."

When you open() a file and read() the contents, you get all the text of
a file in a single string. len() will give you the length of the string
(the total file size) and iterating over the string gives you one
character at at time.

Here is an example of a string:
In [1]: s = 'This is text'
In [2]: len(s)
Out[2]: 12
In [3]: for i in s:
...: print i
...:
...:
T
h
i
s

i
s

t
e
x
t

On the other hand, if you open() the file and then readlines() from the
file, the result is a list of strings, each of with is the contents of
one line of the file, up to and including the newline. len() of the list
is the number of lines in the list, and iterating the list gives each
line in turn.

Here is an example of a list of strings:
In [4]: l = [ 'line1', 'line2' ]
In [5]: len(l)
Out[5]: 2
In [6]: for i in l:
...: print i
...:
...:
line1
line2

Notice that s and l are *used* exactly the same way with len() and for,
but the results are different.

As a further wrinkle, there are two easy ways to get all the lines in a
file and they give slightly different results.

open(...).readlines() returns a list of lines in the file and each line
includes the final newline if it was in the file. (The last line will
not include a newline if the last line of the file did not.)

open(...).read().splitlines() also gives a list of lines in the file,
but the newlines are not included.

HTH,
Kent



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Parsing text file with Python

2007-03-23 Thread Jay Mutter III
Script i have to date is below and
Thanks to your help i can see some daylight  but I still have a few  
questions

1.)  Are there better ways to write this?
2.) As it writes out the one group to the new file for companies it  
is as if it leaves blank lines behind for if I don't have the elif len 
(line) . 1 the
   inventor's file has blank lines in it.
3.) I reopened the inventor's file to get a count of lines but is  
there a better way to do this?

Thanks



in_filename = raw_input('What is the COMPLETE name of the file you  
would like to process?')
in_file = open(in_filename, 'rU')
text = in_file.readlines()
count = len(text)
print "There are ", count, 'lines to process in this file'
out_filename1 = raw_input('What is the COMPLETE name of the file in  
which you would like to save Companies?')
companies = open(out_filename1, 'aU')
out_filename2 = raw_input('What is the COMPLETE name of the file in  
which you would like to save Inventors?')
patentdata = open(out_filename2, 'aU')
for line in text:
 if line.endswith(')\n'):
 companies.write(line)
 elif line.endswith(') \n'):
 companies.write(line)
  elif len(line) > 1:
 patentdata.write(line)
in_file.close()
companies.close()
patentdata.close()
in_filename2 = raw_input('What was the name of the inventor\'s  
file ?')
in_file2 = open(in_filename2, 'rU')
text2 = in_file2.readlines()
count = len(text2)
print "There are - well until we clean up more - approximately ",  
count, 'inventor\s in this file'
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why is it...

2007-03-23 Thread Jay Mutter III

Got it - it needs the blank line to signal that code block has ended.
Thanks

On Mar 22, 2007, at 3:05 PM, Jason Massey wrote:


In the interpreter this doesn't work:

>>> f = open(r"c:\python24\image.dat")
>>> line = f.readline()
>>> while line:
... line = f.readline()
... f.close()
Traceback (  File "", line 3
f.close()
^
SyntaxError: invalid syntax

But this does:

>>> f = open(r"c:\python24\image.dat")
>>> line = f.readline()
>>> while line:
... line = f.readline()
...
>>> f.close()
>>>

Note the differing placement of the f.close() statement, it's not  
part of the while.



On 3/22/07, Kent Johnson <[EMAIL PROTECTED]> wrote:
Jay Mutter III wrote:
> Why is it that when I run the following interactively
>
> f = open('Patents-1920.txt')
> line = f.readline()
> while line:
>  print line,
>  line = f.readline()
> f.close()
>
> I get an error message
>
> File "", line 4
>  f.close()
>  ^
> SyntaxError: invalid syntax
>
> but if i run it in a script there is no error?

Can you copy/paste the actual console transcript?

BTW a better way to write this is
f = open(...)
for line in f:
 print line,
f.close()

Kent

>
> Thanks
>
> Jay
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Another string question

2007-03-23 Thread Jay Mutter III

Andre;

Thanks again for the assistance.

I have corrected the splitlines error and it works ( well that part  
of anyway) correctly now.


On Mar 23, 2007, at 5:30 AM, Andre Engels wrote:


2007/3/22, Jay Mutter III <[EMAIL PROTECTED]>:
I wanted the following to check each line and if it ends in a right
parentheses then write the entire line to one file and if not then
write the line to anther.
It wrote all of the ) to one file and the rest of the line (ie minus
the ) to the other file.

The line:
 print "There are ", count, 'lines to process in this file'
should give you a hint - don't you think this number was rather high?

The problem is that if you do "for line in text" with text being a  
string, it will not loop over the _lines_  in the string, but over  
the _characters_ in the string.


The easiest solution would be to replace
 text = in_file.read()
by
 text = in_file.readlines()

in_filename = raw_input('What is the COMPLETE name of the file you
would like to process?')
in_file = open(in_filename, 'rU')
text = in_file.read()
count = len(text.splitlines())
print "There are ", count, 'lines to process in this file'
out_filename1 = raw_input('What is the COMPLETE name of the file in
which you would like to save Companies?')
companies = open(out_filename1, 'aU')
out_filename2 = raw_input('What is the COMPLETE name of the file in
which you would like to save Inventors?')
patentdata = open(out_filename2, 'aU')
for line in text:
 if line[-1] in ')':
 companies.write(line)
 else:
 patentdata.write(line)
in_file.close()
companies.close()
patentdata.close()

Thanks

jay
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



--
Andre Engels, [EMAIL PROTECTED]
ICQ: 6260644  --  Skype: a_engels


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Should I use python for parsing text?

2007-03-23 Thread Jay Mutter III
First thanks for all of the help
I am actually starting to see the light.

On Mar 22, 2007, at 7:51 AM, Kent Johnson wrote:

> Jay Mutter III wrote:
>> Kent;
>> Thanks for the reply on tutor-python.
>> My data file which is just a .txt file created under WinXP by an  
>> OCR program contains lines like:
>> A.-C. Manufacturing Company. (See Sebastian, A. A.,
>> and Capes, assignors.)
>> A. G. A. Railway Light & Signal Co. (See Meden, Elof
>> H„ assignor.)
>> A-N Company, The. (See Alexander and Nasb, as-
>> signors.;
>> AN Company, The. (See Nash, It. J., and Alexander, as-
>> signors.)
>> I use an intel imac running OS x10.4.9 and when I used python to  
>> append one file to another I got a file that opened in OS X's
>> TexEdit program with characters that looked liked Japanese/Chinese  
>> characters.
>> When i pasted them into my mail client (OS X's mail) they were  
>> then just a sequence of question marks so I am not sure what  
>> happened.
>> Any thoughts???
>
> For some reason, after you run the Python program, TexEdit thinks  
> the file is not ascii data; it seems to think it is utf-8 or a  
> Chinese encoding. Your original email was utf-8 which points in  
> that direction but is not conclusive.
>
> If you zip up and send me the original file and the cleandata.txt  
> file *exactly as it is produced* by the Python program - not edited  
> in any way - I will take a look and see if I can guess what is  
> going on.
>>

You are correct that it was utf-8
Multiple people were scanning pages and converting to text, some  
saved as ascii and some saved as unicode
The sample used above was utf-8 so after your comment i checked all,  
put everything as ascii, combined all pieces into one file and  
normalized the line endings to unix style


>> And i tried  using the following on the above data:
>> in_filename = raw_input('What is the COMPLETE name of the file you  
>> want to open:')
>> in_file = open(in_filename, 'r')
>
> It wouldn't hurt to use universal newlines here since you are  
> working cross-platform:
>   open(in_filename, 'Ur')
>

corrected this

>> text = in_file.readlines()
>> num_lines = text.count('\n')
>
> Here 'text' is a list of lines, so text.count('\n') is counting the  
> number of blank lines (lines containing only a newline) in your  
> file. You should use
>   num_lines = len(text)
>

changed


>> print 'There are', num_lines, 'lines in the file', in_filename
>> output = open("cleandata.txt","a")# file for writing data to  
>> after stripping newline character
>
> I agree with Luke, use 'w' for now to make sure the file has only  
> the output of this program. Maybe something already in the file is  
> making it look like utf-8...
>
>> # read file, copying each line to new file
>> for line in text:
>> if len(line) > 1 and line[-2] in ';,-':
>> line = line.rstrip()
>> output.write(line)
>> else: output.write(line)
>> print "Data written to cleandata.txt."
>> # close the files
>> in_file.close()
>> output.close()
>> As written above it tells me that there are 0 lines which is  
>> surprising because if I run the first part by itself it tells  
>> there are 1982 lines ( actually 1983 so i am figuring EOF)
>> It copies/writes the data to the cleandata file but it does not  
>> strip out CR and put data on one line ( a sample of what i am  
>> trying to get is next)
>> A.-C. Manufacturing Company. (See Sebastian, A. A., and Capes,  
>> assignors.)
>> My apologies if i have intruded.
>
> Please reply on-list in the future.
>
> Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Another string question

2007-03-23 Thread Jay Mutter III


On Mar 23, 2007, at 5:30 AM, Andre Engels wrote:


2007/3/22, Jay Mutter III <[EMAIL PROTECTED]>:
I wanted the following to check each line and if it ends in a right
parentheses then write the entire line to one file and if not then
write the line to anther.
It wrote all of the ) to one file and the rest of the line (ie minus
the ) to the other file.

The line:
 print "There are ", count, 'lines to process in this file'
should give you a hint - don't you think this number was rather high?

The problem is that if you do "for line in text" with text being a  
string, it will not loop over the _lines_  in the string, but over  
the _characters_ in the string.


The easiest solution would be to replace
 text = in_file.read()
by
 text = in_file.readlines()



Thanks for the response
Actually the number of lines this returns is the same number of lines  
given when i put it in a text editor (TextWrangler).
Luke had mentioned the same thing earlier but when I do change read  
to readlines  i get the following



Traceback (most recent call last):
  File "extract_companies.py", line 17, in ?
count = len(text.splitlines())
AttributeError: 'list' object has no attribute 'splitlines'




in_filename = raw_input('What is the COMPLETE name of the file you
would like to process?')
in_file = open(in_filename, 'rU')
text = in_file.read()
count = len(text.splitlines())
print "There are ", count, 'lines to process in this file'
out_filename1 = raw_input('What is the COMPLETE name of the file in
which you would like to save Companies?')
companies = open(out_filename1, 'aU')
out_filename2 = raw_input('What is the COMPLETE name of the file in
which you would like to save Inventors?')
patentdata = open(out_filename2, 'aU')
for line in text:
 if line[-1] in ')':
 companies.write(line)
 else:
 patentdata.write(line)
in_file.close()
companies.close()
patentdata.close()

Thanks

jay
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



--
Andre Engels, [EMAIL PROTECTED]
ICQ: 6260644  --  Skype: a_engels


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Another string question

2007-03-22 Thread Jay Mutter III
I wanted the following to check each line and if it ends in a right  
parentheses then write the entire line to one file and if not then  
write the line to anther.
It wrote all of the ) to one file and the rest of the line (ie minus  
the ) to the other file.


in_filename = raw_input('What is the COMPLETE name of the file you  
would like to process?')
in_file = open(in_filename, 'rU')
text = in_file.read()
count = len(text.splitlines())
print "There are ", count, 'lines to process in this file'
out_filename1 = raw_input('What is the COMPLETE name of the file in  
which you would like to save Companies?')
companies = open(out_filename1, 'aU')
out_filename2 = raw_input('What is the COMPLETE name of the file in  
which you would like to save Inventors?')
patentdata = open(out_filename2, 'aU')
for line in text:
 if line[-1] in ')':
 companies.write(line)
 else:
 patentdata.write(line)
in_file.close()
companies.close()
patentdata.close()

Thanks

jay
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Why is it...

2007-03-22 Thread Jay Mutter III
Why is it that when I run the following interactively

f = open('Patents-1920.txt')
line = f.readline()
while line:
 print line,
 line = f.readline()
f.close()

I get an error message

File "", line 4
 f.close()
 ^
SyntaxError: invalid syntax

but if i run it in a script there is no error?

Thanks

Jay

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Should I use python for parsing text

2007-03-20 Thread Jay Mutter III

"Jay Mutter III"  wrote

> See example  next:
> A.-C. Manufacturing Company. (See Sebastian, A. A.,
> and Capes, assignors.)
>...
>Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
>Jan. 27 ; v. 270 ; p. 554.
>
> For instance, I would like to go to end of line and if last
> character  is a comma or semicolon or hyphen then
> remove the CR.

It would look something like:

output = open('example.fixed','w')
for line in file('example.txt'):
if line[-1] in ',;-':# check last character
  line = line.strip() # lose the C/R
  output.write(line)# write to output
else: output.write(line)  # append the next line complete with C/R
output.close()




Working from the above suggestion ( and thank you very much - i did  
enjoy your online tutorial)

I came up with the following:

import os
import sys
import re
import string

# The next 5 lines are so I have an idea of how many lines i started  
with in the file.


in_filename = raw_input('What is the COMPLETE name of the file you  
want to open:')

in_file = open(in_filename, 'r')
text = in_file.read()
num_lines = text.count('\n')
print 'There are', num_lines, 'lines in the file', in_filename

output = open("cleandata.txt","a")# file for writing data to  
after stripping newline character


# read file, copying each line to new file
for line in text:
if line[:-1] in '-':
line = line.rstrip()
output.write(line)
else: output.write(line)

print "Data written to cleandata.txt."

# close the files
in_file.close()
output.close()

The above ran with no erros, gave me the number of lines in my  
orginal file but then when i opened the cleandata.txt file

I got:

A.-C.䴀愀渀甀昀愀挀琀甀爀椀渀最Company.⠀匀攀攀 
Sebastian,䄀⸀A.,and䌀愀瀀攀猀Ⰰassignors.)A.䜀⸀A.刀 
愀椀氀眀愀礀Light☀Signal䌀漀⸀(See䴀攀搀攀渀 
ⰀElofHassignor.)A-N䌀漀洀瀀愀渀礀ⰀThe.⠀匀攀攀 
Alexander愀渀搀Nasb,愀猀ⴀ猀椀最渀漀爀猀⸀㬀䄀一 
Company,吀栀攀⸀(See一愀猀栀ⰀIt.䨀⸀Ⰰand䄀氀攀砀 
愀渀搀攀爀Ⰰas-


So what did I do to cause all of the strange characters
Plus since this goes on it is as if it removed all \n and not just  
the ones after a hyphen which I was using as my test case.


Thanks again.

Jay



> Then move line by line through the file and delete everything
> after a  numerical sequence

Slightly more tricky because you need to use a regular expression.
But if you know regex then only slightly.

>  I am wondering if Python would be a good tool

Absolutely, its one of the areas where Python excels.

> find information on how to accomplish this

You could check  my tutorial on the three topics:

Handling text
Handling files
Regular Expressions.

Also the standard python documentation for the general tutorial
(assuming you've done basic programming in some other language
before) plus the re module

> using something like the unix tool awk or something else??

awk or sed could both be used, but Python is more generally
useful so unless you already know awk I'd take the time to
learn the basics of Python (a few hours maybe) and use that.

--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Should I use python for parsing text

2007-03-11 Thread Jay Mutter III
I am using an intel iMac with OS -X 10.4.8.
It has Python 2.3.5.

My issue is that I have a lot of text ( about 500 pages at the  
moment) that I need to parse so that I can eliminate  info I don't  
need, break the remainder into fields and put in a database/spreadsheet.
See example  next:

A.-C. Manufacturing Company. (See Sebastian, A. A.,
and Capes, assignors.)
A. G. A. Railway Light & Signal Co. (See Meden, Elof
H„ assignor.)
A-N Company, The. (See Alexander and Nasb, as-
signors.;
AN Company, The. (See Nash, It. J., and Alexander, as-
signors.)
A/S. Arendal Smelteverk.(See Kaaten, Einar, assignor.)
A/S. Bjorgums Gevaei'kompani. (See Bjorguni, Nils, as-
signor.)
A/S  Mekano. (Sec   Schepeler,   Herman  A.,  assignor.)
A/S Myrens Verkstad.(See Klling, Jens W. A., assignor.)
A/S Stordo Kisgruber. (See Nielsen, C., and Ilelleland,
assignors.)
A-Z Company, The.'See llanmer, Laurence G., assignor.)
Aagaard, Carl L., Rockford, 111. Hand scraping tool. No.
1,345,058 ; July 6; v. 276 ; p. 05.
Aalborg, Christian, Wllkinsburg, Pa., assignor to Wcst-
inghouse Electric and Manufacturing Company. Trol-
ley.No. 1,334,943 ; Mar. 30 ; v. 272 ; p. 741.
Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
Jan. 27 ; v. 270 ; p. 554.

For instance, I would like to go to end of line and if last character  
is a comma or semicolon or hyphen then remove the CR.
Then move line by line through the file and delete everything after a  
numerical sequence

  I am wondering if Python would be a good tool and if so where can I  
find information on how to accomplish this or would I be better off  
using something like the unix tool awk or something else??

Thanks

Jay
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor