Re: [Numpy-discussion] Huge arrays

2009-09-11 Thread Chad Netzer
On Tue, Sep 8, 2009 at 6:41 PM, Charles R Harris
 wrote:
>

> More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux.

And just to further clarify, even with PAE enabled on linux, any
individual process has about a 3 GB address limit (there are hacks to
raise that to 3.5 or 4GB, but with a performance penalty).  But 4 GB
is the absolute max addressable RAM  for a single 32 bit process (even
if the kernel itself can use up to 64GB of physical RAM with PAE).
For gory details on Windows address space limits:

http://msdn.microsoft.com/en-us/library/bb613473%28VS.85%29.aspx

If running 64bit is not an option, I'd consider the "compress in RAM"
technique.  Delta-compression for most sampled signals should be quite
doable.  Heck, here's some untested pseudo-code:

import numpy
import zlib

data_row = numpy.zeros(200, dtype=numpy.int16)
# Fill up data_row

compressed_row_strings = []
data_row[1:] = data_row[:-1] - data_row[1:]# quick n dirty delta encoding

compressed_row_strings.append(zlib.compress(data_row.tostring())

# Put a loop in there, reuse the row array, and you are almost all
set.  The delta
# encoding is optional, but probably useful for most "real world" 1d signals.
# If you don't have the time between samples to compress the whole row, break
# it into smaller chunks  (see zlib.compressobj())

-C
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-10 Thread David Cournapeau
Kim Hansen wrote:
>
> On 9-Sep-09, at 4:48 AM, Francesc Alted wrote:
>
> > Yes, this later is supported in PyTables as long as the underlying
> > filesystem
> > supports files > 2 GB, which is very usual in modern operating
> > systems.
>
> I think the OP said he was on Win32, in which case it should be noted:
> FAT32 has its upper file size limit at 4GB (minus one byte), so
> storing both your arrays as one file on a FAT32 partition is a no-no.
>
> David
>
>  
> Strange, I work on Win32 systems, and there I have no problems storing
> data files up to 600 GB (have not tried larger) in size stored on
> RAID0 disk systems of 2x1TB, I can also open them and seek in them
> using Python.

It is a FAT32 limitation, not a windows limitation. NTFS should handle
large files without much trouble, and I believe the vast majority of
windows installations (>= windows xp) use NTFS and not FAT32. I
certainly have not seen a windows installed on FAT32 for a very long time.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-10 Thread Kim Hansen
>
> On 9-Sep-09, at 4:48 AM, Francesc Alted wrote:
>
> > Yes, this later is supported in PyTables as long as the underlying
> > filesystem
> > supports files > 2 GB, which is very usual in modern operating
> > systems.
>
> I think the OP said he was on Win32, in which case it should be noted:
> FAT32 has its upper file size limit at 4GB (minus one byte), so
> storing both your arrays as one file on a FAT32 partition is a no-no.
>
> David
>

Strange, I work on Win32 systems, and there I have no problems storing data
files up to 600 GB (have not tried larger) in size stored on RAID0 disk
systems of 2x1TB, I can also open them and seek in them using Python. For
those data files, I use Pytables lzo compressed h5 files to create and
maintain an index to the large data file Besides some meta data describing
chunks of data, the index also conains a data position value stating what
the file position of the beginning of each data chunk (payload) is. The
index files I work with in h5 format are not larger than 1.5 GB though.

It all works very nice and it is very convenient

Kim
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-09 Thread David Warde-Farley
On 9-Sep-09, at 4:48 AM, Francesc Alted wrote:

> Yes, this later is supported in PyTables as long as the underlying  
> filesystem
> supports files > 2 GB, which is very usual in modern operating  
> systems.

I think the OP said he was on Win32, in which case it should be noted:  
FAT32 has its upper file size limit at 4GB (minus one byte), so  
storing both your arrays as one file on a FAT32 partition is a no-no.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-09 Thread Francesc Alted
A Wednesday 09 September 2009 10:48:48 Francesc Alted escrigué:
> OTOH, having the possibility to manage compressed data buffers
> transparently in NumPy would help here, but not there yet ;-)

Now that I think about it, in case the data is compressible, Daniel could try 
to define a PyTables' compressed array or table on-disk and save chunks to it.
If data is compressible enough, the filesystem cache will keep it in-memory, 
until the disk can eventually absorb it.

For doing this, I would recommend to use the LZO compressor, as it is one of 
the fastest I've seen (at least until Blosc would be ready), because it can 
compress up to 5 times faster than output data to disk (depending on how 
compressible the data is, and the speed of the disk subsystem).

Of course, if data is not compressible at all, then this venue doesn't make a 
lot of sense.

HTH,

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-09 Thread Francesc Alted
A Wednesday 09 September 2009 07:22:33 David Cournapeau escrigué:
> On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase wrote:
> > Hi,
> > you can probably use PyTables for this. Even though it's meant to
> > save/load data to/from disk (in HDF5 format) as far as I understand,
> > it can be used to make your task solvable - even on a 32bit system !!
> > It's free (pytables.org) -- so maybe you can try it out and tell me if
> > I'm right 
>
> You still would not be able to load a numpy array > 2 Gb. Numpy memory
> model needs one contiguously addressable chunk of memory for the data,
> which is limited under the 32 bits archs. This cannot be overcome in
> any way AFAIK.
>
> You may be able to save data > 2 Gb, by appending several chunks < 2
> Gb to disk - maybe pytables supports this if it has large file support
> (which enables to write files > 2Gb on a 32 bits system).

Yes, this later is supported in PyTables as long as the underlying filesystem 
supports files > 2 GB, which is very usual in modern operating systems.  This 
even works on 32-bit systems as the indexing machinery in Python has been 
completely replaced inside PyTables.

However, I think that what Daniel is trying to achieve is to be able to keep 
all the info in-memory because writing it to disk is too slow.  I also agree 
that your suggestion to use a 64-bit OS (or 32-bit Linux, as it can address 
the full 3GB right out-of-the-box, as Chuck said) is the way to go.

OTOH, having the possibility to manage compressed data buffers transparently 
in NumPy would help here, but not there yet ;-)

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread David Cournapeau
On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase wrote:
> Hi,
> you can probably use PyTables for this. Even though it's meant to
> save/load data to/from disk (in HDF5 format) as far as I understand,
> it can be used to make your task solvable - even on a 32bit system !!
> It's free (pytables.org) -- so maybe you can try it out and tell me if
> I'm right 

You still would not be able to load a numpy array > 2 Gb. Numpy memory
model needs one contiguously addressable chunk of memory for the data,
which is limited under the 32 bits archs. This cannot be overcome in
any way AFAIK.

You may be able to save data > 2 Gb, by appending several chunks < 2
Gb to disk - maybe pytables supports this if it has large file support
(which enables to write files > 2Gb on a 32 bits system).

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread Sebastian Haase
Hi,
you can probably use PyTables for this. Even though it's meant to
save/load data to/from disk (in HDF5 format) as far as I understand,
it can be used to make your task solvable - even on a 32bit system !!
It's free (pytables.org) -- so maybe you can try it out and tell me if
I'm right 
Or someone else here would know right away...

Cheers,
Sebastian Haase


On Wed, Sep 9, 2009 at 6:19 AM, Sturla Molden wrote:
> Daniel Platz skrev:
>> data1 = numpy.zeros((256,200),dtype=int16)
>> data2 = numpy.zeros((256,200),dtype=int16)
>>
>> This works for the first array data1. However, it returns with a
>> memory error for array data2. I have read somewhere that there is a
>> 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still
>> be below that? I use Windows XP Pro 32 bit with 3GB of RAM.
>
> There is a 2 GB limit for user space on Win32, this is about 1.9 GB. You
> have other programs running as well, so this is still too much. Also
> Windows reserves 50% of RAM for itself, so you have less than 1.5 GB to
> play with.
>
> S.M.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread Sturla Molden
Daniel Platz skrev:
> data1 = numpy.zeros((256,200),dtype=int16)
> data2 = numpy.zeros((256,200),dtype=int16)
>
> This works for the first array data1. However, it returns with a
> memory error for array data2. I have read somewhere that there is a
> 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still
> be below that? I use Windows XP Pro 32 bit with 3GB of RAM.

There is a 2 GB limit for user space on Win32, this is about 1.9 GB. You 
have other programs running as well, so this is still too much. Also 
Windows reserves 50% of RAM for itself, so you have less than 1.5 GB to 
play with.

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread Charles R Harris
On Tue, Sep 8, 2009 at 7:30 PM, Daniel Platz <
mail.to.daniel.pl...@googlemail.com> wrote:

> Hi,
>
> I have a numpy newbie question. I want to store a huge amount of data
> in  an array. This data come from a measurement setup and I want to
> write them to disk later since there is nearly no time for this during
> the measurement. To put some numbers up: I have 2*256*200 int16
> numbers which I want to store. I tried
>
> data1 = numpy.zeros((256,200),dtype=int16)
> data2 = numpy.zeros((256,200),dtype=int16)
>
> This works for the first array data1. However, it returns with a
> memory error for array data2. I have read somewhere that there is a
> 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still
> be below that? I use Windows XP Pro 32 bit with 3GB of RAM.
>
>
More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux. The
rest of the address space is set aside for the operating system.  Note that
address space is not the same as physical memory, but it sets a limit on
what you can use, whether swap or real memory.

Chuck.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread David Cournapeau
On Wed, Sep 9, 2009 at 9:30 AM, Daniel
Platz wrote:
> Hi,
>
> I have a numpy newbie question. I want to store a huge amount of data
> in  an array. This data come from a measurement setup and I want to
> write them to disk later since there is nearly no time for this during
> the measurement. To put some numbers up: I have 2*256*200 int16
> numbers which I want to store. I tried
>
> data1 = numpy.zeros((256,200),dtype=int16)
> data2 = numpy.zeros((256,200),dtype=int16)
>
> This works for the first array data1. However, it returns with a
> memory error for array data2. I have read somewhere that there is a
> 2GB limit for numpy arrays on a 32 bit machine

This has nothing to do with numpy per se - that's the fundamental
limitation of 32 bits architectures. Each of your array is 1024 Mb, so
you won't be able to create two of them.
The 2Gb limit is a theoretical upper limit, and in practice, it will
always be lower, if only because python itself needs some memory.
There is also the memory fragmentation problem, which means allocating
one contiguous, almost 2Gb segment will be difficult.

> If someone has an idea to help me I would be very glad.

If you really need to deal with arrays that big, you should move on 64
bits architecture. That's exactly the problem they are solving.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Huge arrays

2009-09-08 Thread Daniel Platz
Hi,

I have a numpy newbie question. I want to store a huge amount of data
in  an array. This data come from a measurement setup and I want to
write them to disk later since there is nearly no time for this during
the measurement. To put some numbers up: I have 2*256*200 int16
numbers which I want to store. I tried

data1 = numpy.zeros((256,200),dtype=int16)
data2 = numpy.zeros((256,200),dtype=int16)

This works for the first array data1. However, it returns with a
memory error for array data2. I have read somewhere that there is a
2GB limit for numpy arrays on a 32 bit machine but shouldn't I still
be below that? I use Windows XP Pro 32 bit with 3GB of RAM.

If someone has an idea to help me I would be very glad.

Thanks in advance.

Daniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion