Re: Best way to parse file into db-type layout?

Steve Holden Sat, 30 Apr 2005 20:17:19 -0700

John Machin wrote:

On Sat, 30 Apr 2005 09:23:16 -0400, Steve Holden <[EMAIL PROTECTED]>
wrote:

John Machin wrote:
[...]

I wouldn't use fileinput for a "commercial data processing" exercise,
because it's slow, and (if it involved using the Python csv module) it
opens the files in text mode, and because in such exercises I don't
often need to process multiple files as though they were one file.

If the process runs once a month, and take ten minutes to process the required data, isn't that fast enough.

Depends: (1) criticality: could it have been made to run in 5 minutes,
avoiding the accountant missing the deadline to EFT the taxes to the
government (or, worse, missing the last train home)?

Get real: if that's the the timeline you don't need new software, you need a new accountant.

(2) "Many a mickle makes a muckle": the total of all run times could
be such that overnight processing doesn't complete before the day
shift turns up ...

Again, get real and stop nitpicking.

It's unwise to act as though "slow" is an absolute term.
When I am interested in multiple files -- more likely a script that
scans source files -- even though I wouldn't care about the speed nor
the binary mode, I usually do something like:
for pattern in args: # args from an optparse parser
   for filename in glob.glob(pattern):
       for line in open(filename):
There is also an "on principle" element to it as well -- with
fileinput one has to use the awkish methods like filelineno() and
nextfile(); strikes me as a tricksy and inverted way of doing things.
But if it happens to be convenient for the task at hand why deny the OP the use of a tool that can solve a problem? We shouldn't be so purist that we create extra (and unnecessary) work :-), and principles should be tempered with pragmatism in the real world.
If the job at hand is simulating awk's file reading habits, yes then
fileinput is convenient. However if the job at hand involves anything
like real-world commercial data processing requirements then fileinput
is NOT convenient.

Yet again, get real. If someone tells me that fileinput meets their requirements who am I (not to mention who are *you*) to say they should invest extra effort in solving their problem some other way?

Example 1: Requirement is, for each input file, to display name of
file, number of records, and some data totals.

Example 2: Requirement is, if end of file occurs when not expected
(including, but not restricted to, the case of zero records) display
an error message and terminate abnormally.

Possibly these examples would have some force if they weren't simply invented.

I'd like to see some code for example 1 that used fileinput (on a list
of filenames) and didn't involve "extra (and unnecessary) work"
compared to the "for filename in alist / f = open(filename) / for line
in f" way of doing it.

If fileinput didn't exist, what do you think the reaction would be if
you raised a PEP to include it in the core?

Why should such speculation interest me?

regards
 Steve
--
Steve Holden        +1 703 861 4237  +1 800 494 3119
Holden Web LLC             http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to parse file into db-type layout?

Reply via email to