Memory problem

2006-08-14 Thread Yi Xing
Hi,

I need to read a large amount of data into a list. So I am trying to 
see if I'll have any memory problem. When I do
x=range(2700*2700*3) I got the following message:

Traceback (most recent call last):
File "", line 1, in ?
MemoryError

Any way to get around this problem? I have a machine of 4G memory. The 
total number of data points (float) that I need to read is in the order 
of 200-300 millions.

Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list


Memory problem

2006-08-14 Thread Yi Xing
I tried the following code:

>>> i=0
>>> n=2600*2600*30
>>> a=array.array("f")
>>> while (i<=n):
.. i=i+1
.. a.append(float(i))
..
Traceback (most recent call last):
  File "", line 3, in ?
MemoryError

to see the size of the array at the time of memory error:
>>>len(a)
8539248.

I use Windows XP x64 with 4GB RAM.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread bearophileHUGS
Yi Xing wrote:
> I need to read a large amount of data into a list. So I am trying to
> see if I'll have any memory problem. When I do
> x=range(2700*2700*3) I got the following message:
> Traceback (most recent call last):
>   File "", line 1, in ?
> MemoryError
> Any way to get around this problem? I have a machine of 4G memory. The
> total number of data points (float) that I need to read is in the order
> of 200-300 millions.

If you know that you need floats only, then you can use a typed array
(an array.array) instead of an untyped array (a Python list):

import array
a = array.array("f")

You can also try with a numerical library like scipy, it may support up
to 2 GB long arrays.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin
Yi Xing wrote:
> Hi,
>
> I need to read a large amount of data into a list. So I am trying to
> see if I'll have any memory problem. When I do
> x=range(2700*2700*3) I got the following message:
>
> Traceback (most recent call last):
>   File "", line 1, in ?
> MemoryError
>
> Any way to get around this problem? I have a machine of 4G memory. The
> total number of data points (float) that I need to read is in the order
> of 200-300 millions.
>

2700*2700*3 is only 21M. Your computer shouldn't have raised a sweat,
let alone MemoryError. Ten times that got me a MemoryError on a 1GB
machine.

A raw Python float takes up 8 bytes. On a 32-bit machine a float object
will have another 8 bytes of (type, refcount). Instead of a list, you
probably need to use an array.array (which works on homogenous
contents, so it costs 8 bytes each float, not 16), or perhaps
numeric/numpy/scipy/...

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin

[EMAIL PROTECTED] wrote:

> If you know that you need floats only, then you can use a typed array
> (an array.array) instead of an untyped array (a Python list):
>
> import array
> a = array.array("f")
>

Clarification: typecode 'f' stores a Python float (64-bits, equivalent
to a C double) as a 32-bit FP number (equivalent to a C float) -- with
apart from the obvious loss of precision, a little extra time being
required to convert to & fro. You may consider the trade-off
worthwhile.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Tim Chase
> I need to read a large amount of data into a list. So I am trying to 
> see if I'll have any memory problem. When I do
> x=range(2700*2700*3) I got the following message:
> 
> Traceback (most recent call last):
>   File "", line 1, in ?
> MemoryError
> 
> Any way to get around this problem? I have a machine of 4G memory. The 
> total number of data points (float) that I need to read is in the order 
> of 200-300 millions.

While others on the list have given you options for how to 
accommodate this monstrosity, you've not mentioned what you 
intend to do with the data once you've shoveled it all into ram.

Often, the easiest way to solve the problem is to prevent it from 
happening in the first place.  Is there any way to operate on 
your data in a stream-oriented fashion?  Or use a database 
filestore underneath?  This would allow you to operate on a much 
smaller scale, and perhaps simply gather some aggregate 
statistics while skimming along the data stream.

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Larry Bates
Yi Xing wrote:
> Hi,
> 
> I need to read a large amount of data into a list. So I am trying to see
> if I'll have any memory problem. When I do
> x=range(2700*2700*3) I got the following message:
> 
> Traceback (most recent call last):
> File "", line 1, in ?
> MemoryError
> 
> Any way to get around this problem? I have a machine of 4G memory. The
> total number of data points (float) that I need to read is in the order
> of 200-300 millions.
> 
> Thanks.
> 
On my 1Gb machine this worked just fine, no memory error.

-Larry Bates
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Yi Xing
On a related question: how do I initialize a list or an array with a 
pre-specified number of elements, something like
int p[100] in C? I can do append() for 100 times but this looks silly...

Thanks.

Yi Xing

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Simon Forman

Yi Xing wrote:
> On a related question: how do I initialize a list or an array with a
> pre-specified number of elements, something like
> int p[100] in C? I can do append() for 100 times but this looks silly...
>
> Thanks.
>
> Yi Xing

You seldom need to do that in python, but it's easy enough:

new_list = [0 for notused in xrange(100)]

or if you already have a list:

my_list.extend(0 for notused in xrange(100))

HTH,
~Simon

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Thomas Nelson

Yi Xing wrote:
> On a related question: how do I initialize a list or an array with a
> pre-specified number of elements, something like
> int p[100] in C? I can do append() for 100 times but this looks silly...
> 
> Thanks.
> 
> Yi Xing

Use [0]*100 for a list.

THN

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin
Yi Xing wrote:
> I tried the following code:
>
> >>> i=0
> >>> n=2600*2600*30
> >>> a=array.array("f")
> >>> while (i<=n):
> .. i=i+1
> .. a.append(float(i))

Not a good idea. The array has to be resized, which may mean that a
realloc won't work because of fragmentation, you're out of luck because
plan B is to malloc another chunk, but that's likely to fail as well.
> ..
> Traceback (most recent call last):
>   File "", line 3, in ?
> MemoryError
>
> to see the size of the array at the time of memory error:
> >>>len(a)
> 8539248.

Incredible. That's only 34 MB. What is the size of your paging file?
What memory guzzlers were you running at the same time? What was the
Task Manager "Performance" pane showing while your test was running?
What version of Python?

FWIW I got up to len(a) == 122998164 (that's 14 times what you got) on
a machine with  only 1GB of memory and a 1523MB paging file, with
Firefox & ZoneAlarm running (the pagefile was showing approx 300MB in
use at the start of the test).

> I use Windows XP x64 with 4GB RAM.

Maybe there's a memory allocation problem with the 64-bit version.
Maybe MS just dropped in the old Win95 memory allocator that the timbot
used to fulminate about :-(

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Larry Bates
Yi Xing wrote:
> On a related question: how do I initialize a list or an array with a
> pre-specified number of elements, something like
> int p[100] in C? I can do append() for 100 times but this looks silly...
> 
> Thanks.
> 
> Yi Xing
> 
Unlike other languages this is seldom done in Python.  I think you should
probably be looking at http://numeric.scipy.org/ if you want to have
"traditional" arrays of floats.

-Larry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Yi Xing
Thanks! I just found that that I have no problem with 
x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

-Yi
On Aug 14, 2006, at 3:08 PM, Larry Bates wrote:

> Yi Xing wrote:
>> On a related question: how do I initialize a list or an array with a
>> pre-specified number of elements, something like
>> int p[100] in C? I can do append() for 100 times but this looks 
>> silly...
>>
>> Thanks.
>>
>> Yi Xing
>>
> Unlike other languages this is seldom done in Python.  I think you 
> should
> probably be looking at http://numeric.scipy.org/ if you want to have
> "traditional" arrays of floats.
>
> -Larry
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Yi Xing
Is there a way that I can define a two-dimensional array in 
array.array()? Thanks.
On Aug 14, 2006, at 2:28 PM, John Machin wrote:

> Yi Xing wrote:
>> I tried the following code:
>>
> i=0
> n=2600*2600*30
> a=array.array("f")
> while (i<=n):
>> .. i=i+1
>> .. a.append(float(i))
>
> Not a good idea. The array has to be resized, which may mean that a
> realloc won't work because of fragmentation, you're out of luck because
> plan B is to malloc another chunk, but that's likely to fail as well.
>> ..
>> Traceback (most recent call last):
>>   File "", line 3, in ?
>> MemoryError
>>
>> to see the size of the array at the time of memory error:
> len(a)
>> 8539248.
>
> Incredible. That's only 34 MB. What is the size of your paging file?
> What memory guzzlers were you running at the same time? What was the
> Task Manager "Performance" pane showing while your test was running?
> What version of Python?
>
> FWIW I got up to len(a) == 122998164 (that's 14 times what you got) on
> a machine with  only 1GB of memory and a 1523MB paging file, with
> Firefox & ZoneAlarm running (the pagefile was showing approx 300MB in
> use at the start of the test).
>
>> I use Windows XP x64 with 4GB RAM.
>
> Maybe there's a memory allocation problem with the 64-bit version.
> Maybe MS just dropped in the old Win95 memory allocator that the timbot
> used to fulminate about :-(
>
> Cheers,
> John
>
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin

Yi Xing wrote:
> On a related question: how do I initialize a list or an array with a
> pre-specified number of elements, something like
> int p[100] in C? I can do append() for 100 times but this looks silly...
>
> Thanks.
>
> Yi Xing

In the case of an array, you may wish to consider the fromfile()
method.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Martin v. Löwis
John Machin wrote:
> Incredible. That's only 34 MB. What is the size of your paging file?
> What memory guzzlers were you running at the same time? What was the
> Task Manager "Performance" pane showing while your test was running?
> What version of Python?

He didn't say Windows (so far). AFAICT, his system might be Linux, and
he might have an ulimit of 1GB (or some such).

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Martin v. Löwis
Yi Xing wrote:
> Thanks! I just found that that I have no problem with
> x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.

That's no surprise. In the first case, try

x[0][0] = 20.0
print x[1][0]

You have the very same (identical) list of 2560*2560 values in x
500 times.

To create such a structure correctly, do

x = [None] * 500
for i in range(500)
  x[i] = [10.0]*2560*2560

In any case, check ulimit(1).

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin
Yi Xing wrote:
> Thanks! I just found that that I have no problem with
> x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
>

range(1*2560*2560*30) is creating a list of 196M *unique* ints.
Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
for the type pointer, 4 for the refcount and  4 for the actual list
element (a pointer to the 12-byte object). so that's one chunk of
4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
gets allocated for a request of 12 bytes. Let's guess at 16. So the
total memory you need is 3920M.

Now let's look at [[10.0]*2560*2560]*500.
Firstly that creates a tiny list [10.0]. then you create a list that
contains 2560*2560 = 6.5 M references to that *one* object containing
10.0. That's 26MB. Then you make a list of 500 references to that big
list. This new list costs you 2000 bytes. Total required: about 26.2MB.
The minute you start having non-unique numbers instead of 10.0, this
all falls apart.

In any case, your above comparison is nothing at all to do with the
solution that you need, which as already explained will involve
array.array or numpy.

What you now need to do is answer the questions about your pagefile
etc.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread John Machin

Martin v. Löwis wrote:
> John Machin wrote:
> > Incredible. That's only 34 MB. What is the size of your paging file?
> > What memory guzzlers were you running at the same time? What was the
> > Task Manager "Performance" pane showing while your test was running?
> > What version of Python?
>
> He didn't say Windows (so far).

Yes he did -- reread his message, which I quoted in the message which
you have *partially* quoted above.
"""
I use Windows XP x64 with 4GB RAM. 
"""

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-14 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, Yi Xing wrote:

> Is there a way that I can define a two-dimensional array in 
> array.array()? Thanks.

If you need more than one dimension you really should take a look at
`numarray` or `numpy`.  What are you going to do with the data once it's
loaded into memory?

Ciao,
Marc 'BlackJack' Rintsch

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory problem

2006-08-15 Thread Yi Xing
I used the array module and loaded all the data into an array. 
Everything works fine now.
On Aug 14, 2006, at 4:01 PM, John Machin wrote:

> Yi Xing wrote:
>> Thanks! I just found that that I have no problem with
>> x=[[10.0]*2560*2560]*500, but x=range(1*2560*2560*30) doesn't work.
>>
>
> range(1*2560*2560*30) is creating a list of 196M *unique* ints.
> Assuming 32-bit ints and pointers: that's 4 bytes each for the value, 4
> for the type pointer, 4 for the refcount and  4 for the actual list
> element (a pointer to the 12-byte object). so that's one chunk of
> 4x196M = 786MB of contiguous list, plus 196M chunks each whatever size
> gets allocated for a request of 12 bytes. Let's guess at 16. So the
> total memory you need is 3920M.
>
> Now let's look at [[10.0]*2560*2560]*500.
> Firstly that creates a tiny list [10.0]. then you create a list that
> contains 2560*2560 = 6.5 M references to that *one* object containing
> 10.0. That's 26MB. Then you make a list of 500 references to that big
> list. This new list costs you 2000 bytes. Total required: about 26.2MB.
> The minute you start having non-unique numbers instead of 10.0, this
> all falls apart.
>
> In any case, your above comparison is nothing at all to do with the
> solution that you need, which as already explained will involve
> array.array or numpy.
>
> What you now need to do is answer the questions about your pagefile
> etc.
>
> Cheers,
> John
>
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Update on Memory problem with NumPy arrays

2006-06-21 Thread sonjaa
Hi

last week I posted a problem with running out of memory when changing
values in NumPy arrays. Since then I have tried many different
approaches and
work-arounds but to no avail.

I was able to reduce the code (see below) to its smallest size and
still
have the problem, albeit at a slower rate. The problem appears to come
from changing values in the array. Does this create another reference
to the
array,  which can't be released?

Also, are there other python methods/extensions that can create
multi-deminsional
arrays?

thanks again to those who repsonded to the last post
Sonja

PS. to watch the memory usage I just used task manager

the code:
from numpy import *

y = ones((501,501))
z = zeros((501,501))
it = 50

for kk in xrange(it):
y[1,1] = 4
y[1,2] = 4
y[1,0] = 4
y[2,1] = 6

print "Iteration #:%s" %(kk)
for ee in xrange(0,501):
for ff in xrange(0,501):
if y[ee,ff] == 4 or y[ee,ff] == 6:
y[ee,ff] = 2
else:
pass

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update on Memory problem with NumPy arrays

2006-06-21 Thread Robert Kern
sonjaa wrote:
> Hi
> 
> last week I posted a problem with running out of memory when changing
> values in NumPy arrays. Since then I have tried many different
> approaches and
> work-arounds but to no avail.
> 
> I was able to reduce the code (see below) to its smallest size and
> still
> have the problem, albeit at a slower rate.

Please post this to numpy-discussion instead of here. Also, please create a
ticket in our Trac:

  http://projects.scipy.org/scipy/numpy

> The problem appears to come
> from changing values in the array. Does this create another reference
> to the
> array,which can't be released?

Since the array shouldn't be going away and no new arrays should be created that
wouldn't cause a problem.

It's possible that there is a bug with the scalar objects that are being created
when indexing into the arrays. I can reproduce abnormal memory consumption even
when I remove the line where you are setting value in the if: clause.

> Also, are there other python methods/extensions that can create
> multi-deminsional
> arrays?

A few. numpy's predecessors, Numeric and numarray are still usable, but aren't
being actively developed. There's a pure-Python array package somewhere, but I
forget the name.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update on Memory problem with NumPy arrays

2006-06-21 Thread Fredrik Lundh
sonjaa wrote:

> Also, are there other python methods/extensions that can create
> multi-deminsional arrays?

if this example is typical for the code you're writing, you might as 
well use nested Python lists:

 def make_array(width, height, value):
 out = []
 for y in range(height):
 out.append([value] * width)
 return

 y = make_array(501, 501, 1)
 z = make_array(501, 501, 0)

 y[ee][ff] = 4

etc



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update on Memory problem with NumPy arrays

2006-06-21 Thread Filip Wasilewski
sonjaa wrote:
> Hi
>
> last week I posted a problem with running out of memory when changing
> values in NumPy arrays. Since then I have tried many different
> approaches and
> work-arounds but to no avail.
[...]

Based on the numpy-discussion this seems to be fixed in the SVN now(?).

Anyway, you can use 'where' function to eliminate the loops:

from numpy import *

y = ones((501,501))
z = zeros((501,501))
it = 50

for kk in xrange(it):
y[1,1] = 4
y[1,2] = 4
y[1,0] = 4
y[2,1] = 6

print "Iteration #:%s" %(kk)
y = where((y == 4) | (y == 6), 2, y) 


best,
fw

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update on Memory problem with NumPy arrays

2006-06-21 Thread sonjaa
I've been in contact with Travis O, and he said it was fixed in the
SVN.
thanks for the suggestions, I'll try them out now.

best
Sonja


Filip Wasilewski wrote:
> sonjaa wrote:
> > Hi
> >
> > last week I posted a problem with running out of memory when changing
> > values in NumPy arrays. Since then I have tried many different
> > approaches and
> > work-arounds but to no avail.
> [...]
>
> Based on the numpy-discussion this seems to be fixed in the SVN now(?).
>
> Anyway, you can use 'where' function to eliminate the loops:
>
> from numpy import *
>
> y = ones((501,501))
> z = zeros((501,501))
> it = 50
>
> for kk in xrange(it):
> y[1,1] = 4
> y[1,2] = 4
> y[1,0] = 4
> y[2,1] = 6
>
> print "Iteration #:%s" %(kk)
> y = where((y == 4) | (y == 6), 2, y) 
> 
> 
> best,
> fw

-- 
http://mail.python.org/mailman/listinfo/python-list