Re: Newbie completely confused

2007-09-25 Thread Roel Schroeven
Jeroen Hegeman schreef:
 Thanks for the comments,
 
 (First, I had to add timing code to ReadClasses: the code you posted
 doesn't include them, and only shows timings for ReadLines.)

 Your program uses quite a bit of memory. I guess it gets harder and
 harder to allocate the required amounts of memory.
 
 Well, I guess there could be something in that, but why is there a  
 significant increase after the first time? And after that, single- 
 trip time pretty much flattens out. No more obvious increases.

Sorry, I have no idea.

 If I change this line in ReadClasses:

  built_classes[len(built_classes)] = HugeClass(long_line)

 to

  dummy = HugeClass(long_line)

 then both times the files are read and your data structures are built,
 but after each run the data structure is freed. The result is that  
 both
 runs are equally fast.
 
 Isnt't the 'del LINES' supposed to achieve the same thing? And  
 really, reading 30MB files should not be such a problem, right? (I'm  
 also running with 1GB of RAM.)

'del LINES' deletes the lines that are read from the file, but not all 
of your data structures that you created out of them.
Now, indeed, reading 30 MB files should not be a problem. And I am 
confident that just reading the data is not a problem. To make sure I 
created a simple test:

import time

input_files = [./test_file0.txt, ./test_file1.txt]

total_start = time.time()
data = {}
for input_fn in input_files:
 file_start = time.time()
 f = file(input_fn, 'r')
 data[input_fn] = f.read()
 f.close()
 file_done = time.time()
 print '%s: %f to read %d bytes' % (input_fn, file_done - 
file_start, len(data))
total_done = time.time()
print 'all done in %f' % (total_done - total_start)


When I run that with test_file0.txt and test_file1.txt as you described 
(each 30 MB), I get this output:

./test_file0.txt: 0.26 to read 1 bytes
./test_file1.txt: 0.251000 to read 2 bytes
all done in 0.521000

Therefore I think the problem is not in reading the data, but in 
processing it and creating the data structures.

 You read the files, but don't use the contents; instead you use
 long_line over and over. I suppose you do that because this is a test,
 not your actual code?
 
 Yeah ;-) (Do I notice a lack of trust in the responses I get? Should  
 I not mention 'newbie'?)

I didn't mean to attack you; it's just that the program reads 30 MB of 
data, twice, but doesn't do anything with it. It only uses the data that 
was stored in long_lines, and which never is replaced. That is very 
strange for real code, but as a test it can have it's uses. That's why I 
asked.

 Let's get a couple of things out of the way:
 - I do know about meaningful variable names and case-conventions,  
 but ... First of all I also have to live with inherited code (I don't  
 like people shouting in their code either), and secondly (all the  
 itemx) most of these members normally _have_ descriptive names but  
 I'm not supposed to copy-paste the original code to any newsgroups.

Ok.

 - I also know that a plain 'return' in python does not do anything  
 but I happen to like them. Same holds for the sys.exit() call.

Ok.

 - The __init__ methods normally actually do something: they  
 initialise some member variables to meaningful values (by calling the  
 clear() method, actually).
 - The __clear__ method normally brings objects back into a well- 
 defined 'empty' state.
 - The __del__ methods are actually needed in this case (well, in the  
 _real_ code anyway). The python code loads a module written in C++  
 and some of the member variables actually point to C++ objects  
 created dynamically, so one actually has to call their destructors  
 before unbinding the python var.

That sounds a bit weird to me; I would think such explicit memory 
management belongs in the C++ code instead of in the Python code, but I 
must admit that I know next to nothing about extending Python so I 
assume you are right.

 All right, thanks for the tips. I guess the issue itself is still  
 open, though.

I'm afraid so. Sorry I can't help.

One thing that helped me in the past to speed up input is using memory 
mapped I/O instead of stream I/O. But that was in C++ on Windows; I 
don't know if the same applies to Python on Linux.

-- 
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
   -- Isaac Asimov

Roel Schroeven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-25 Thread Roel Schroeven
Roel Schroeven schreef:
 import time
 
 input_files = [./test_file0.txt, ./test_file1.txt]
 
 total_start = time.time()
 data = {}
 for input_fn in input_files:
  file_start = time.time()
  f = file(input_fn, 'r')
  data[input_fn] = f.read()
  f.close()
  file_done = time.time()
  print '%s: %f to read %d bytes' % (input_fn, file_done - 
 file_start, len(data))

... that should of course be len(data[input_fn]) ...

 total_done = time.time()
 print 'all done in %f' % (total_done - total_start)
 
 
 When I run that with test_file0.txt and test_file1.txt as you described 
 (each 30 MB), I get this output:
 
 ./test_file0.txt: 0.26 to read 1 bytes
 ./test_file1.txt: 0.251000 to read 2 bytes
 all done in 0.521000

... and then that becomes:

./test_file0.txt: 0.29 to read 3317 bytes
./test_file1.txt: 0.231000 to read 3317 bytes
all done in 0.521000


-- 
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
   -- Isaac Asimov

Roel Schroeven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-24 Thread Jeroen Hegeman
Thanks for the comments,


 (First, I had to add timing code to ReadClasses: the code you posted
 doesn't include them, and only shows timings for ReadLines.)

 Your program uses quite a bit of memory. I guess it gets harder and
 harder to allocate the required amounts of memory.

Well, I guess there could be something in that, but why is there a  
significant increase after the first time? And after that, single- 
trip time pretty much flattens out. No more obvious increases.


 If I change this line in ReadClasses:

  built_classes[len(built_classes)] = HugeClass(long_line)

 to

   dummy = HugeClass(long_line)

 then both times the files are read and your data structures are built,
 but after each run the data structure is freed. The result is that  
 both
 runs are equally fast.

Isnt't the 'del LINES' supposed to achieve the same thing? And  
really, reading 30MB files should not be such a problem, right? (I'm  
also running with 1GB of RAM.)

 I'm not sure how to speed things up here... you're doing much  
 processing
 on a lot of small chunks of data. I have a number of observations and
 possible improvements though, and some might even speed things up a  
 bit.

Cool thanks, let's go over them.


 You read the files, but don't use the contents; instead you use
 long_line over and over. I suppose you do that because this is a test,
 not your actual code?

Yeah ;-) (Do I notice a lack of trust in the responses I get? Should  
I not mention 'newbie'?)

Let's get a couple of things out of the way:
- I do know about meaningful variable names and case-conventions,  
but ... First of all I also have to live with inherited code (I don't  
like people shouting in their code either), and secondly (all the  
itemx) most of these members normally _have_ descriptive names but  
I'm not supposed to copy-paste the original code to any newsgroups.
- I also know that a plain 'return' in python does not do anything  
but I happen to like them. Same holds for the sys.exit() call.
- The __init__ methods normally actually do something: they  
initialise some member variables to meaningful values (by calling the  
clear() method, actually).
- The __clear__ method normally brings objects back into a well- 
defined 'empty' state.
- The __del__ methods are actually needed in this case (well, in the  
_real_ code anyway). The python code loads a module written in C++  
and some of the member variables actually point to C++ objects  
created dynamically, so one actually has to call their destructors  
before unbinding the python var.

I tried to get things down to as small as possible, but when I found  
out that the size of the classes seems to contribute to the issue  
(removing enough member variables will bring you to a point where all  
of a sudden the speed increases a factor ten, there seems to be some  
breakpoint depending on the size of the classes) I could not simply  
remove all members but had to give them funky names. I kept the main  
structure of things, though, to see if that would solicit comments.  
(And it did...)



 In a number of cases, you use a dict like this:

  built_classes  = {}
  for i in LINES:
  built_classes[len(built_classes)] = ...

 So you're using the indices 0, 1, 2, ... as the keys. That's not what
 dictionaries are made for; lists are much better for that:

  built_classes = []
  for i  in LINES:
  built_classes.append(...)

Yeah, I inherited that part...


 Your readLines() function reads a whole file into memory. If you're
 working with large files, that's not such a good idea. It's better to
 load one line at a time into memory and work on that. I would even
 completely remove readLines() and restructure ReadClasses() like this:

Actually, part of what I removed was the real reason why readLines()  
is there at all: it reads files in blocks of (at most) some_number  
lines, and keeps track of the line offset in the file. I kept this  
structure hoping that someone would point out something obvious like  
some internal buffer going out of scope or whatever.

All right, thanks for the tips. I guess the issue itself is still  
open, though.

Cheers,
Jeroen

Jeroen Hegeman
jeroen DOT hegeman AT gmail DOT com

WARNING: This message may contain classified information. Immediately  
burn this message after reading.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-24 Thread Jeroen Hegeman


 Your code does NOT include any statements that could have produced the
 above line of output -- IOW, you have not posted the code that you
 actually ran.

Oh my, I must have cleaned it up a bit too much, hoping that people  
would focus on the issue instead of the formatting of the output  
strings! Did you miss your morning coffee???

 Your code is already needlessly monstrously large.
Which I realised and apologised for beforehand.


 And Python 2.5.1 does what? Strike 3.

Hmm, I must have missed where it said that you can only ask for help  
if you're using the latest version... In case you're wondering, 2.5.1  
is not _really_ that wide-spread as most of the older versions.

 For handling the bit extraction stuff, either
[snip]
 (b) do a loop over the bit positions

Now that sounds more useful. I'll give that a try.

Thanks,
Jeroen

Jeroen Hegeman
jeroen DOT hegeman AT gmail DOT com

WARNING: This message may contain classified information. Immediately  
burn this message after reading.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-24 Thread Istvan Albert
Two comments,

 ...
 self.item3 = float(foo[c]); c+=1
 self.item4 = float(foo[c]); c+=1
 self.item5 = float(foo[c]); c+=1
 self.item6 = float(foo[c]); c+=1
 ...

this here (and your code in general) is mind boggling and not in a
good way,

as for you original question, I don't think that reading in files of
the size you mention can cause any substantial problems, I think the
problem is somewhere else,

you can run the code below to see that the read times are unaffected
by the order   of processing

--

import timeit

# make a big file
NUM= 10**5
fp = open('bigfile.txt', 'wt')
longline = ' ABC '* 60 + '\n'
for count in xrange( NUM ):
fp.write( longline )
fp.close()

setup1 = 
def readLines():
data = []
for line in file('bigfile.txt'):
data.append( line )
return data


stmt1 = 
data = readLines()


stmt2 = 
data = readLines()
data = readLines()


stmt3 = 
data = file('bigfile.txt').readlines()


def run( setup, stmt, N=5 ):
t = timeit.Timer(stmt=stmt, setup=setup)
msec = 1000 * t.timeit(number=N)/N
print %f msec/pass % msec

if __name__ == '__main__':
for stmt in (stmt1, stmt2, stmt3):
run(setup=setup1, stmt=stmt)



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-24 Thread John Machin
On Sep 25, 1:51 am, Jeroen Hegeman [EMAIL PROTECTED] wrote:
  Your code does NOT include any statements that could have produced the
  above line of output -- IOW, you have not posted the code that you
  actually ran.

 Oh my, I must have cleaned it up a bit too much, hoping that people
 would focus on the issue instead of the formatting of the output
 strings! Did you miss your morning coffee???

The difference was not a formatting difference; it was complete
absence of a statement, raising the question of what other non-obvious
differences there might be.

You miss the point: if it is obvious that the posted code did not
produce the posted output (common when newbies are thrashing around
trying to solve a problem), some of the audience may not bother trying
to help with the main issue -- they may attempt to help with side
issues (as I did with the fugly code bloat) or just ignore you
altogether.


  Your code is already needlessly monstrously large.

 Which I realised and apologised for beforehand.

An apology does not change the fact that the code was needlesly large
(AND needed careful post-linefolding reformatting just to make it
runnable) and so some may not have bothered to read it.


  And Python 2.5.1 does what? Strike 3.

 Hmm, I must have missed where it said that you can only ask for help
 if you're using the latest version...

You missed the point again: that your problem may be fixed in a later
version.

 In case you're wondering, 2.5.1
 is not _really_ that wide-spread as most of the older versions.

I wasn't wondering. I know. I maintain a package (xlrd) which works on
Python 2.5 all the way back to 2.1. It occasionally has possibly
similar second iteration goes funny issues (e.g. when reading 120MB
Excel spreadsheet files one after the other). You mention that
removing some attributes from a class may make your code stop
exhibiting cliff-face behaviour. If you can produce two versions of
your code that actually demonstrate the abrupt change, I'd be quite
interested in digging into it, to our possible mutual benefit.

  For handling the bit extraction stuff, either
 [snip]
  (b) do a loop over the bit positions

 Now that sounds more useful. I'll give that a try.


I'm glad you found something possibly more useful in my posting :-)

Cheers,
John


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-23 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Gabriel
Genellina wrote:

 En Fri, 21 Sep 2007 13:34:40 -0300, Jeroen Hegeman
 [EMAIL PROTECTED] escribi�:
 
 class ModerateClass:
  def __init__(self):
  return
  def __del__(self):
  pass
  return

 class HugeClass:
  def __init__(self,line):
  self.clear()
  self.input(line)
  return
  def __del__(self):
  del self.B4v
  return
  def clear(self):
  self.long_classes = {}
  self.B4v={}
  return
 
 (BTW, all those return statements are redundant and useless)

The OP could be trying to use them as some kind of textual indicator of the
end of the function. Myself, I prefer end-comments, e.g.

class HugeClass :

...

def clear(self) :
...
#end clear

#end HugeClass

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newbie completely confused

2007-09-22 Thread Roel Schroeven
Jeroen Hegeman schreef:
 ...processing all 2 files found
 -- 1/2: ./test_file0.txt
 Now reading ...
 DEBUG readLines A took 0.093 s
 ...took 8.85717201233 seconds
 -- 2/2: ./test_file0.txt
 Now reading ...
 DEBUG readLines A took 3.917 s
 ...took 12.8725550175 seconds
 
 So the first time around the file gets read in in ~0.1 seconds, the  
 second time around it needs almost four seconds! As far as I can see  
 this is related to 'something in memory being copied around' since if  
 I replace the 'alternative 1' by the 'alternative 2', basically  
 making sure that my classes are not used, reading time the second  
 time around drops back to normal (= roughly what it is the first pass).

(First, I had to add timing code to ReadClasses: the code you posted 
doesn't include them, and only shows timings for ReadLines.)

Your program uses quite a bit of memory. I guess it gets harder and 
harder to allocate the required amounts of memory.

If I change this line in ReadClasses:

 built_classes[len(built_classes)] = HugeClass(long_line)

to

dummy = HugeClass(long_line)

then both times the files are read and your data structures are built, 
but after each run the data structure is freed. The result is that both 
runs are equally fast.

Also, if I run the first version (without the dummy) on a computer with 
a bit more memory (1 GiB), it seems there is no problem allocating 
memory: both runs are equally fast.

I'm not sure how to speed things up here... you're doing much processing 
on a lot of small chunks of data. I have a number of observations and 
possible improvements though, and some might even speed things up a bit.

You read the files, but don't use the contents; instead you use 
long_line over and over. I suppose you do that because this is a test, 
not your actual code?

__init__() with nothing (or only return) in it is not useful; better to 
just leave it out.


You have a number of return statements that don't do anything (i.e. they 
return nothing (None actually) at the end of the function). A function 
without return automatically returns None at the end, so it's better to 
leave them out.

Similarly you don't need to call sys.exit(): the script will terminate 
anyway if it reaches the end. Better leave it out.


LongClass.clear() doesn't do anything and isn't called anyway; leave it out.


ModerateClass.__del__() doesn't do anything either. I'm not sure how it 
affects what happens if ModerateClass gets freed, but I suggest you 
don't start messing with __del__() until you have more Python knowledge 
and experience. I'm not sure why you think you need to implement that 
method.
The same goes for HugeClass.__del__(). It does delete self.B4v, but the 
default behavior will do that too. Again, I don't get why you want to 
override the default behavior.


In a number of cases, you use a dict like this:

 built_classes  = {}
 for i in LINES:
 built_classes[len(built_classes)] = ...

So you're using the indices 0, 1, 2, ... as the keys. That's not what 
dictionaries are made for; lists are much better for that:

 built_classes = []
 for i  in LINES:
 built_classes.append(...)


HugeClass.B4v isn't used, so you can safely remove it.


Your readLines() function reads a whole file into memory. If you're 
working with large files, that's not such a good idea. It's better to 
load one line at a time into memory and work on that. I would even 
completely remove readLines() and restructure ReadClasses() like this:

def ReadClasses(filename):
  print 'Now reading ...'

  built_classes = []

  # Open file
  in_file = open(filename, 'r')

  # Read lines and interpret them.
  time_a = time.time()
  for i in in_file:
## This is alternative 1.
  built_classes.append(HugeClass(long_line))
## The next line is alternative 2.
##built_classes[len(built_classes)] = long_line

  in_file.close()
  time_b = time.time()
  print DEBUG readClasses took %.3f s % (time_b - time_a)

Personally I only use 'i' for integer indices (as in 'for i in 
range(10)'); for other use I prefer more descriptive names:

 for line in in_file: ...

But I guess that's up to personal preference. Also you used LINES to 
store the file contents; the convention is that names with all capitals 
are used for constants, not for things that change.


In ProcessList(), you keep the index in a separate variable. Python has 
a trick so you don't have to do that yourself:

 nfiles = len(input_files)
 for file_index, i in enumerate(input_files):
 print -- %i/%i: %s % (file_index + 1, nfiles, i)
 ReadClasses(i)


Instead of item0, item1, ... , it's generally better to use a list, so 
you can use item[0], item[1], ...


And finally, John Machin's suggestion looks like a good way to 
restructure that long sequence of conversions and assignments in HugeClass.


-- 
The saddest aspect of life right now is that science gathers 

Newbie completely confused

2007-09-21 Thread Jeroen Hegeman
Dear Pythoneers,

I'm moderately new to python and it got me completely lost already.

I've got a bunch of large (30MB) txt files containing one 'event' per  
line. I open files after each other, read them line by line and from  
each line build a 'data structure' of a main class (HugeClass)  
containing some simple information as well as several instances of  
some other classes.

No problem so far, but I noticed that the first file was always  
faster than the others, whereas I would expect it to be slower, if  
anything. Testing with two copies of the same file shows the same  
behaviour.

Below is a (rather large, I'll explain) chunk of code. I ran this in  
a directory with two test files called 'test_file0.txt' and  
'test_file1.txt', each containing 10k lines of the same information  
as the 'long_line' variable in the code. This shows the following  
timing (consistently) for the little piece of code that reads all  
lines from file:

...processing all 2 files found
-- 1/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 0.093 s
...took 8.85717201233 seconds
-- 2/2: ./test_file0.txt
Now reading ...
DEBUG readLines A took 3.917 s
...took 12.8725550175 seconds

So the first time around the file gets read in in ~0.1 seconds, the  
second time around it needs almost four seconds! As far as I can see  
this is related to 'something in memory being copied around' since if  
I replace the 'alternative 1' by the 'alternative 2', basically  
making sure that my classes are not used, reading time the second  
time around drops back to normal (= roughly what it is the first pass).

I already want to apologise for the size of the code chunk below. I  
know about 'minimal reproducible examples' and such but I found out  
that if I commented out the filling (and thus binding) of some of the  
member variables in the lower-level classes, the problem (sometimes)  
also disappears. That also points to some magic happening in memory?

I probably mucked something up but I'm really lost as to where. Any  
help would be appreciated.

The original problem showed up using Python 2.4.3 under linux (Fedora  
Core 1).
Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?).

Thanks,
Jeroen

P.S. Any ideas on optimising the input to the classes would be  
welcome too ;-)

Jeroen Hegeman
jeroen DOT hegeman AT gmail DOT com



===Start of code chunk=
#!/usr/bin/env python

import time
import sys
import os
import gzip
import pdb

long_line =  
1,31905,0,174501,46152419,2117961,143,-1.,51,2,-19.9139,42,-19.9140 
, 
6.6002,0,0,0,46713.1484,2,0.,-1,1.4203220606,0.3876158297,147.121017 
4561,147.1284120973,-2,0.,-1,1.5887237787,-2.4011900425,-319.7776794 
434,319.7906836817,4,21,0.,-1,-0.5672637224,2.2052443027,-43.2842369 
080,43.3440905719,21,0.,-1,-0.8540721536,0.0770076364,-22.7033920288 
, 
22.7195827425,21,0.,-1,0.1623233557,0.5845987201,-28.0794525146,28.0 
860084170,21,0.,-1,0.1943928897,-0.2195242196,-22.0666370392,22.0685 
899391,6,0.,-1,-40.1810989380,-127.0743789673,-104.9231948853,239.74 
36794163,-6,0.,-1,43.2013626099,125.0640945435,-67.7339172363,227.17 
53587387,24,0.,-1,-57.9123306274,-17.3483123779,-71.8334121704,123.4 
397648033,-24,0.,-1,84.0985488892,54.4542312622,-62.4525032043,144.5 
299239704,5,0.,-1,17.7312316895,-109.7260665894,-33.0897827148,116.3 
039146130,-5,0.,-1,-40.8971862793,70.6098632812,-5.2814140320,82.645 
4347683,4,0.,-1,-6.2859884724,-17.9586020410,-58.9464384913,69.40294 
68585,-3,0.,-1,-51.6263811588,0.6104701459,-12.8869901896,54.0368221 
571,3,0.,-1,16.4690684490,48.0271777511,-51.7867884636,74.5327484701 
,-4,0.,-1,67.6295298338,6.4269350171,-10.6658525467,69.9971834876,7, 
7,1.0345464706e+01,-7.0800781250e+01,-2.0385742187e+01,7.5256346272e 
+01,1.3148,0.0072,0.0072,1.3148,0.0072,0.0072,1.0255,1.0413,0.0,0.0,0.0, 
0.0,-1.0,-4.2383,49.5276,13,0.1537,0.5156,0,0.9982,0.0034,1.,7,1,0.9 
566,0.0062,1,0,2,1.2736,1,7.8407,1,0,2,1.2736,1,7.8407,0,0,-1.0,-1.0,5,1 
,-2.4047853470e+01,4.0832519531e+01,-3.8452150822e+00,4.7851562559e 
+01,1.3383,0.0051,0.0051,1.3383,0.0051,0.0051,0.9340,0.9541,0.0,0.0,0.0, 
0.0,-1.0,-2.4609,21.3916,7,0.1166,0.5977,0,0.,0.0052,1.,9,1,0.99 
47,0.0063,1,0,2,0.7735,1,74.7937,1,0,2,0.7735,1,74.7937,0,0,-1.0,-1.0,5, 
1,-4.4067382812e+01,2.5634796619e+00,-1.1138916016e+01,4.6203614579e 
+01,1.3533,0.0054,0.0054,1.3533,0.0054,0.0054,1.0486,1.0903,0.0,0.0,0.0, 
0.0,-1.0,-3.9648,31.3733,13,0.1767,0.5508,100,0.9977,0.0040,1.,9,1,0 
. 
,0.4349,0,0,0,0.,0,-1000.,0,0,0,0.,0,-1000.,0,0,-1.0 
,-1.0,0,1,3.7200927734e+01,2.7465817928e+00,-5.5847163200e 
+00,3.7994386563e 
+01,1.3634,0.0062,0.0062,1.6488,0.0385,0.0385,0.7141,0.9013,5.3986899118 
e+00,6.6766492833e-01,-2.3780213181e-01,5.4460399892e 
+00,0.5504,-3.1445,0.7776,9,0.1169,0.7734,0,0.9977,0.0040,1.,7,1,0.0 

Re: Newbie completely confused

2007-09-21 Thread John Machin
On Sep 22, 2:34 am, Jeroen Hegeman [EMAIL PROTECTED] wrote:
[snip]
 ...processing all 2 files found
 -- 1/2: ./test_file0.txt
 Now reading ...
 DEBUG readLines A took 0.093 s
 ...took 8.85717201233 seconds

Your code does NOT include any statements that could have produced the
above line of output -- IOW, you have not posted the code that you
actually ran. Your code is already needlessly monstrously large.
That's two strikes against anyone bothering to try to nut out what's
going wrong, if indeed anything is going wrong.

[snip]

 The original problem showed up using Python 2.4.3 under linux (Fedora
 Core 1).
 Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?).

And Python 2.5.1 does what? Strike 3.


 P.S. Any ideas on optimising the input to the classes would be
 welcome too ;-)

1. What is the point of having a do-nothing __init__ method? I'd
suggest making the __init__method do the input.

2. See below

[snip]

 class LongClass:

  def __init__(self):
  return
  def clear(self):
  return
  def input(self, foo, c):
  self.item0 = float(foo[c]); c += 1
  self.item1 = float(foo[c]); c += 1
[multiple snips ahead]
  self.item18 = float(foo[c]); c+=1
  self.item19 = int(foo[c]); c+=1
  self.item20 = float(foo[c]); c+=1
  self.item27 = bool(int(foo[c])); c+=1
  self.item30 = (foo[c] == 1); c += 1
  self.item31 = (foo[c] == 1); c += 1
  self.item47 = bool(int(foo[c])); c+=1
  return c

at global level:

converters = [float] * 48
cvlist = [
(int, (19, 22, 26, 34, 40, 46)),
(lambda z: bool(int(z)), (27, 47)),
(lambda z: z == 1, (30, 31, 36, 37, 42)),
]
for func, indexes in cvlist:
for x in indexes:
converters[x] = func
enumerated_converters = list(enumerate(converters))

Then:

 def input(self, foo, c):
 self.item = [func(foo[c+x]) for x, func in
enumerated_converters]
 return c + 48

which requires you refer to obj.item[19] instead of obj.item19

If you *must* use item19 etc, then try this:

for x, func in enumerated_converters:
setattr(self, item%d % x, func(foo[c+x]))

You could also (shock, horror) use meaningful names for the
attributes ... include a list of attribute names in the global stuff,
and put the relevant name in as the 2nd arg of setattr() instead of
itemxx.

For handling the bit extraction stuff, either

(a) conversion functions have a 2nd arg which defaults to None and
whose usage depends on the function itself ... would be mask or bit
position (or could be e.g. a scale factor for implied-decimal-point
input)

or

(b) do a loop over the bit positions

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie completely confused

2007-09-21 Thread Gabriel Genellina
En Fri, 21 Sep 2007 13:34:40 -0300, Jeroen Hegeman  
[EMAIL PROTECTED] escribi�:

 So the first time around the file gets read in in ~0.1 seconds, the
 second time around it needs almost four seconds! As far as I can see
 this is related to 'something in memory being copied around' since if
 I replace the 'alternative 1' by the 'alternative 2', basically
 making sure that my classes are not used, reading time the second
 time around drops back to normal (= roughly what it is the first pass).

 class ModerateClass:
  def __init__(self):
  return
  def __del__(self):
  pass
  return

 class HugeClass:
  def __init__(self,line):
  self.clear()
  self.input(line)
  return
  def __del__(self):
  del self.B4v
  return
  def clear(self):
  self.long_classes = {}
  self.B4v={}
  return

Don't use __del__ unless it's absolutely necesary. ModerateClass.__del__  
does nothing, but its mere existence does not allow the garbage collector  
to work efficiently. If you explicitey call clear() from HugeClass, you  
can avoid using __del__ too. And if B4v is not involved in cycles,  
clearing it is not even necesary.
(BTW, all those return statements are redundant and useless)


-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list