Re: [Numpy-discussion] Huge arrays
On Tue, Sep 8, 2009 at 6:41 PM, Charles R Harris wrote: > > More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux. And just to further clarify, even with PAE enabled on linux, any individual process has about a 3 GB address limit (there are hacks to raise that to 3.5 or 4GB, but with a performance penalty). But 4 GB is the absolute max addressable RAM for a single 32 bit process (even if the kernel itself can use up to 64GB of physical RAM with PAE). For gory details on Windows address space limits: http://msdn.microsoft.com/en-us/library/bb613473%28VS.85%29.aspx If running 64bit is not an option, I'd consider the "compress in RAM" technique. Delta-compression for most sampled signals should be quite doable. Heck, here's some untested pseudo-code: import numpy import zlib data_row = numpy.zeros(200, dtype=numpy.int16) # Fill up data_row compressed_row_strings = [] data_row[1:] = data_row[:-1] - data_row[1:]# quick n dirty delta encoding compressed_row_strings.append(zlib.compress(data_row.tostring()) # Put a loop in there, reuse the row array, and you are almost all set. The delta # encoding is optional, but probably useful for most "real world" 1d signals. # If you don't have the time between samples to compress the whole row, break # it into smaller chunks (see zlib.compressobj()) -C ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
Kim Hansen wrote: > > On 9-Sep-09, at 4:48 AM, Francesc Alted wrote: > > > Yes, this later is supported in PyTables as long as the underlying > > filesystem > > supports files > 2 GB, which is very usual in modern operating > > systems. > > I think the OP said he was on Win32, in which case it should be noted: > FAT32 has its upper file size limit at 4GB (minus one byte), so > storing both your arrays as one file on a FAT32 partition is a no-no. > > David > > > Strange, I work on Win32 systems, and there I have no problems storing > data files up to 600 GB (have not tried larger) in size stored on > RAID0 disk systems of 2x1TB, I can also open them and seek in them > using Python. It is a FAT32 limitation, not a windows limitation. NTFS should handle large files without much trouble, and I believe the vast majority of windows installations (>= windows xp) use NTFS and not FAT32. I certainly have not seen a windows installed on FAT32 for a very long time. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
> > On 9-Sep-09, at 4:48 AM, Francesc Alted wrote: > > > Yes, this later is supported in PyTables as long as the underlying > > filesystem > > supports files > 2 GB, which is very usual in modern operating > > systems. > > I think the OP said he was on Win32, in which case it should be noted: > FAT32 has its upper file size limit at 4GB (minus one byte), so > storing both your arrays as one file on a FAT32 partition is a no-no. > > David > Strange, I work on Win32 systems, and there I have no problems storing data files up to 600 GB (have not tried larger) in size stored on RAID0 disk systems of 2x1TB, I can also open them and seek in them using Python. For those data files, I use Pytables lzo compressed h5 files to create and maintain an index to the large data file Besides some meta data describing chunks of data, the index also conains a data position value stating what the file position of the beginning of each data chunk (payload) is. The index files I work with in h5 format are not larger than 1.5 GB though. It all works very nice and it is very convenient Kim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
On 9-Sep-09, at 4:48 AM, Francesc Alted wrote: > Yes, this later is supported in PyTables as long as the underlying > filesystem > supports files > 2 GB, which is very usual in modern operating > systems. I think the OP said he was on Win32, in which case it should be noted: FAT32 has its upper file size limit at 4GB (minus one byte), so storing both your arrays as one file on a FAT32 partition is a no-no. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
A Wednesday 09 September 2009 10:48:48 Francesc Alted escrigué: > OTOH, having the possibility to manage compressed data buffers > transparently in NumPy would help here, but not there yet ;-) Now that I think about it, in case the data is compressible, Daniel could try to define a PyTables' compressed array or table on-disk and save chunks to it. If data is compressible enough, the filesystem cache will keep it in-memory, until the disk can eventually absorb it. For doing this, I would recommend to use the LZO compressor, as it is one of the fastest I've seen (at least until Blosc would be ready), because it can compress up to 5 times faster than output data to disk (depending on how compressible the data is, and the speed of the disk subsystem). Of course, if data is not compressible at all, then this venue doesn't make a lot of sense. HTH, -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
A Wednesday 09 September 2009 07:22:33 David Cournapeau escrigué: > On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase wrote: > > Hi, > > you can probably use PyTables for this. Even though it's meant to > > save/load data to/from disk (in HDF5 format) as far as I understand, > > it can be used to make your task solvable - even on a 32bit system !! > > It's free (pytables.org) -- so maybe you can try it out and tell me if > > I'm right > > You still would not be able to load a numpy array > 2 Gb. Numpy memory > model needs one contiguously addressable chunk of memory for the data, > which is limited under the 32 bits archs. This cannot be overcome in > any way AFAIK. > > You may be able to save data > 2 Gb, by appending several chunks < 2 > Gb to disk - maybe pytables supports this if it has large file support > (which enables to write files > 2Gb on a 32 bits system). Yes, this later is supported in PyTables as long as the underlying filesystem supports files > 2 GB, which is very usual in modern operating systems. This even works on 32-bit systems as the indexing machinery in Python has been completely replaced inside PyTables. However, I think that what Daniel is trying to achieve is to be able to keep all the info in-memory because writing it to disk is too slow. I also agree that your suggestion to use a 64-bit OS (or 32-bit Linux, as it can address the full 3GB right out-of-the-box, as Chuck said) is the way to go. OTOH, having the possibility to manage compressed data buffers transparently in NumPy would help here, but not there yet ;-) -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase wrote: > Hi, > you can probably use PyTables for this. Even though it's meant to > save/load data to/from disk (in HDF5 format) as far as I understand, > it can be used to make your task solvable - even on a 32bit system !! > It's free (pytables.org) -- so maybe you can try it out and tell me if > I'm right You still would not be able to load a numpy array > 2 Gb. Numpy memory model needs one contiguously addressable chunk of memory for the data, which is limited under the 32 bits archs. This cannot be overcome in any way AFAIK. You may be able to save data > 2 Gb, by appending several chunks < 2 Gb to disk - maybe pytables supports this if it has large file support (which enables to write files > 2Gb on a 32 bits system). cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
Hi, you can probably use PyTables for this. Even though it's meant to save/load data to/from disk (in HDF5 format) as far as I understand, it can be used to make your task solvable - even on a 32bit system !! It's free (pytables.org) -- so maybe you can try it out and tell me if I'm right Or someone else here would know right away... Cheers, Sebastian Haase On Wed, Sep 9, 2009 at 6:19 AM, Sturla Molden wrote: > Daniel Platz skrev: >> data1 = numpy.zeros((256,200),dtype=int16) >> data2 = numpy.zeros((256,200),dtype=int16) >> >> This works for the first array data1. However, it returns with a >> memory error for array data2. I have read somewhere that there is a >> 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still >> be below that? I use Windows XP Pro 32 bit with 3GB of RAM. > > There is a 2 GB limit for user space on Win32, this is about 1.9 GB. You > have other programs running as well, so this is still too much. Also > Windows reserves 50% of RAM for itself, so you have less than 1.5 GB to > play with. > > S.M. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
Daniel Platz skrev: > data1 = numpy.zeros((256,200),dtype=int16) > data2 = numpy.zeros((256,200),dtype=int16) > > This works for the first array data1. However, it returns with a > memory error for array data2. I have read somewhere that there is a > 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still > be below that? I use Windows XP Pro 32 bit with 3GB of RAM. There is a 2 GB limit for user space on Win32, this is about 1.9 GB. You have other programs running as well, so this is still too much. Also Windows reserves 50% of RAM for itself, so you have less than 1.5 GB to play with. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
On Tue, Sep 8, 2009 at 7:30 PM, Daniel Platz < mail.to.daniel.pl...@googlemail.com> wrote: > Hi, > > I have a numpy newbie question. I want to store a huge amount of data > in an array. This data come from a measurement setup and I want to > write them to disk later since there is nearly no time for this during > the measurement. To put some numbers up: I have 2*256*200 int16 > numbers which I want to store. I tried > > data1 = numpy.zeros((256,200),dtype=int16) > data2 = numpy.zeros((256,200),dtype=int16) > > This works for the first array data1. However, it returns with a > memory error for array data2. I have read somewhere that there is a > 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still > be below that? I use Windows XP Pro 32 bit with 3GB of RAM. > > More precisely, 2GB for windows and 3GB for (non-PAE enabled) linux. The rest of the address space is set aside for the operating system. Note that address space is not the same as physical memory, but it sets a limit on what you can use, whether swap or real memory. Chuck. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Huge arrays
On Wed, Sep 9, 2009 at 9:30 AM, Daniel Platz wrote: > Hi, > > I have a numpy newbie question. I want to store a huge amount of data > in an array. This data come from a measurement setup and I want to > write them to disk later since there is nearly no time for this during > the measurement. To put some numbers up: I have 2*256*200 int16 > numbers which I want to store. I tried > > data1 = numpy.zeros((256,200),dtype=int16) > data2 = numpy.zeros((256,200),dtype=int16) > > This works for the first array data1. However, it returns with a > memory error for array data2. I have read somewhere that there is a > 2GB limit for numpy arrays on a 32 bit machine This has nothing to do with numpy per se - that's the fundamental limitation of 32 bits architectures. Each of your array is 1024 Mb, so you won't be able to create two of them. The 2Gb limit is a theoretical upper limit, and in practice, it will always be lower, if only because python itself needs some memory. There is also the memory fragmentation problem, which means allocating one contiguous, almost 2Gb segment will be difficult. > If someone has an idea to help me I would be very glad. If you really need to deal with arrays that big, you should move on 64 bits architecture. That's exactly the problem they are solving. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Huge arrays
Hi, I have a numpy newbie question. I want to store a huge amount of data in an array. This data come from a measurement setup and I want to write them to disk later since there is nearly no time for this during the measurement. To put some numbers up: I have 2*256*200 int16 numbers which I want to store. I tried data1 = numpy.zeros((256,200),dtype=int16) data2 = numpy.zeros((256,200),dtype=int16) This works for the first array data1. However, it returns with a memory error for array data2. I have read somewhere that there is a 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still be below that? I use Windows XP Pro 32 bit with 3GB of RAM. If someone has an idea to help me I would be very glad. Thanks in advance. Daniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion