Re: [Tutor] Simple text file processing using fileinput module. Grabbing successive lines failure

2012-07-03 Thread Flynn, Stephen (L P - IT)
 On 02/07/12 18:39, David Rock wrote:
 
  Essentially, your problem isn't with using fileinput, it's with how
you
  handle each line that comes in.
 
 The immediate problem is with mis-using fileinput.
 But once you solve that you then have to deal with the
 other issues David raises.
 
 Once more I recommend the csv module...


Thanks gents - seems the CSV module does everything I require so I'll
get tinkering with it.

Steve.



This email and any attachment to it are confidential.  Unless you are the 
intended recipient, you may not use, copy or disclose either the message or any 
information contained in the message. If you are not the intended recipient, 
you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, 
unless otherwise stated.  All copyright in any Capita material in this email is 
reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for 
legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from 
the receipt, use or transmission of this email to the fullest extent permitted 
by law.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Simple text file processing using fileinput module. Grabbing successive lines failure

2012-07-03 Thread Peter Otten
Flynn, Stephen (L  P - IT) wrote:

 Tutors,
 
 Whilst having a play around with reading in textfiles and reformatting
 them I tried to write a python 3.2 script to read a CSV file, looking for
 any records which were short (indicating that the data may well contain an
 embedded CR/LF. I've attached a small sample file with a split record at
 line 3, and my code.
 
 Call the code with
 
 Python pipesmoker.py MyFile.txt ,
 
 (first parameter is the file being read, second parameter is the field
 separator... a comma in this case)
 
 I can read the file in, I can determine that I'm looking for records which
 have 13 fields and I can find a record which is too short (line 3).
 
 What I can't do is read the successive line to a short line in order to
 append it onto the end of short line before writing the entire amended
 line out. I'm still thinking about how to persuade the fileinput module to
 leap over the successor line so it doesn't get processed again.
 
 When I run the code as it stands, I get a traceback as I'm obviously not
 using fileinput.FileInput.readline() correctly.
 
 value of file is C:\myfile.txt
 value of the delimiter is ,
 I'm looking for  13 , in each currentLine...
 1,000688  ,ABCD,930020854,34,0,1, ,930020854
 ,  ,0,0,0,0
 
 2,000688  ,ABCD,930020854,99,0,1, ,930020854 , 
 ,0,0,0,0
 
 short line found at line 3
 Traceback (most recent call last):
   File C:\Documents and
   Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py, line
   35, in module
 nextLine = fileinput.FileInput.readline(args.file)
   File C:\Python32\lib\fileinput.py, line 301, in readline
 line = self._buffer[self._bufindex]
 AttributeError: 'str' object has no attribute '_buffer'
 
 
 Can someone explain to me how I am supposed to make use of readline() to
 grab the next line of a text file please? It may be that I should be using
 some other module, but chose fileinput as I was hoping to make the little
 routine as generic as possible; able to spot short lines in tab separated,
 comma separated, pipe separated, ^~~^ separated and anything else which my
 clients feel like sending me.

As you already learned the csv module is the best tool to address your 
problem. 

However, I'd like to show a generic way to get an extra item in a for-loop.

Instead of iterating over the iterable (a list or a FileInput object or 
whatever) you first convert it into an iterator explicitly with the iter() 
built-in function and keep the reference around:

iterable = ...
it = iter(iterable)

Then inside the for-loop you get an extra item with the next() function:

for item in it:
if some_condition():
extra = next(it)

next() also allows you to provide a default value; without it you may get a 
StopIteration exception when you apply it on an exhausted iterator.

Here's a self-contained example:

 items = alpha- beta gamma- delta- epsilon zeta.split()
 it = iter(items)
 for item in it:
... while item.endswith(-):
... item += next(it)
... print item
... 
alpha-beta
gamma-delta-epsilon
zeta


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Simple text file processing using fileinput module. Grabbing successive lines failure

2012-07-02 Thread Joel Goldstick
On Mon, Jul 2, 2012 at 10:03 AM, Flynn, Stephen (L  P - IT)
steve.fl...@capita.co.uk wrote:
 Tutors,

 Whilst having a play around with reading in textfiles and reformatting them I 
 tried to write a python 3.2 script to read a CSV file, looking for any 
 records which were short (indicating that the data may well contain an 
 embedded CR/LF. I've attached a small sample file with a split record at 
 line 3, and my code.

 Call the code with

 Python pipesmoker.py MyFile.txt ,

 (first parameter is the file being read, second parameter is the field 
 separator... a comma in this case)

 I can read the file in, I can determine that I'm looking for records which 
 have 13 fields and I can find a record which is too short (line 3).

 What I can't do is read the successive line to a short line in order to 
 append it onto the end of short line before writing the entire amended line 
 out. I'm still thinking about how to persuade the fileinput module to leap 
 over the successor line so it doesn't get processed again.

 When I run the code as it stands, I get a traceback as I'm obviously not 
 using fileinput.FileInput.readline() correctly.

 value of file is C:\myfile.txt
 value of the delimiter is ,
 I'm looking for  13 , in each currentLine...
 1,000688      ,ABCD,930020854,34,0,1, ,930020854 ,  
         ,0,0,0,0

 2,000688      ,ABCD,930020854,99,0,1, ,930020854 ,     
      ,0,0,0,0

 short line found at line 3
 Traceback (most recent call last):
   File C:\Documents and 
 Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py, line 35, 
 in module
     nextLine = fileinput.FileInput.readline(args.file)
   File C:\Python32\lib\fileinput.py, line 301, in readline
     line = self._buffer[self._bufindex]
 AttributeError: 'str' object has no attribute '_buffer'


 Can someone explain to me how I am supposed to make use of readline() to grab 
 the next line of a text file please? It may be that I should be using some 
 other module, but chose fileinput as I was hoping to make the little routine 
 as generic as possible; able to spot short lines in tab separated, comma 
 separated, pipe separated, ^~~^ separated and anything else which my clients 
 feel like sending me.

Take a look at csvreader
http://docs.python.org/library/csv.html#csv.reader.  It comes with
python, and according to the text near this link, it will handle a
situation where EOL characters are contained in quoted fields.  Will
that help you?


-- 
Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Simple text file processing using fileinput module. Grabbing successive lines failure

2012-07-02 Thread David Rock
* Flynn, Stephen (L  P - IT) steve.fl...@capita.co.uk [2012-07-02 15:03]:
 Tutors,
 
 Can someone explain to me how I am supposed to make use of readline()
 to grab the next line of a text file please? It may be that I should
 be using some other module, but chose fileinput as I was hoping to
 make the little routine as generic as possible; able to spot short
 lines in tab separated, comma separated, pipe separated, ^~~^
 separated and anything else which my clients feel like sending me.

There are a couple issues that you need to resolve.  For starters, there
is no guarantee that the successive line is actually part of the
preceding line.  It could very well be that the original line is simply
truncated, in which case trying to append the following line would be
incorrect.

What I typically do in a case like this is use a flag variable and pull
the offending line(s).  So, you need to first determine the best course
of action for resolving the inconsistency (eg, how do you verify the
following line belongs with the preceding)?

Try checking the line, if it's less than 13 then flag and store in a
buffer and continue.  The following line _should_ also error, in which
case, you can try to resolve the two lines, or fail out if the criteria
isn't met.  

Essentially, your problem isn't with using fileinput, it's with how you
handle each line that comes in.

-- 
David Rock
da...@graniteweb.com


pgpsm5eZpm6mp.pgp
Description: PGP signature
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Simple text file processing using fileinput module. Grabbing successive lines failure

2012-07-02 Thread Alan Gauld

On 02/07/12 15:03, Flynn, Stephen (L  P - IT) wrote:

Whilst having a play around with reading in textfiles and reformatting them I

 tried to write a python 3.2 script to read a CSV file,


Best tool for csv files is the csv module, it covers most of the gotchas 
associated with such data.



What I can't do is read the successive line to a short line in order to

 append it onto the end of short line before writing the
 entire amended line out.

Maybe so but we can't help with that because you haven't shown us any 
code related to that issue...



I'm still thinking about how to persuade the fileinput module


fileinput is normally used when processing many similar files. Its not 
usually used when processing a single file. If you wanted to step onto 
the next file in the input list then fileinput would help there.

But processing lines within the file is up to you.


I get a traceback as I'm obviously not using fileinput.FileInput.readline() 
correctly.


Nope, it doesn't look like it but you haven't posted enough code to be 
sure what is happening. But I'll take a guess...



Traceback (most recent call last):
   File C:\Documents and 
Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py, line 35, in 
module
 nextLine = fileinput.FileInput.readline(args.file)
   File C:\Python32\lib\fileinput.py, line 301, in readline
 line = self._buffer[self._bufindex]
AttributeError: 'str' object has no attribute '_buffer'


It looks like you are not creating an instance of the FileInput class.
You are trying to use the methods directly. Thus the class tries to 
execute the call by using args as self. But args is a string not a 
FileInput instance and it therefore finds no _buffer attribute.


Look at the documentation. The very first few lines show what you want:

--
This module implements a helper class and functions to quickly write a 
loop over standard input or a list of files. If you just want to read or 
write one file see open().


The typical use is:
import fileinput
for line in fileinput.input():
process(line)
---

Note the reference to processing a single file with open() and note the 
absence of FileInput in the example code.


Further down it says:

--
The class which implements the sequence behavior provided by the module 
is available for subclassing as well:


class fileinput.FileInput([files[, inplace[, backup[, mode[, openhook])

Class FileInput is the implementation; its methods filename(), fileno(), 
lineno(), filelineno(), isfirstline(), isstdin(), nextfile() and close() 
correspond to the functions of the same name in the module. In addition 
it has a readline() method which returns the next input line,

-

So normally you don't need to use FileInput at all, unless you are 
creating some kind of specialized sub class version. But if you do

use it you need to use it like any other class and create an instance.

HTH,

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor