Re: [Perldl] PDL and large data structure per cell in a large array

Derek Lamb Mon, 30 Mar 2009 14:01:19 -0700

P Kishor wrote:
> Here is a large data structure --
>
> my $pdl = pdl (
>       (
>               1 .. 10,
>               [ [1 .. 33], x $d ], # d arrays
>               [ [1 .. 57], x $l ], # l arrays
>               [ [1 .. 9 ], x $m ], # m arrays
>       )
> );
>
> $d is BETWEEN 0 and 5
> $l is BETWEEN 1 and 10
> $m is 7300, but could be as high as 18,000 or 20,000
>
> my $size = howbig($pdl->get_datatype);
> print "size of pdl is: $size\n";
> print $pdl->info("Type: %T Dim: %-15D State: %S"), "\n";
> my $n = $pdl->nelem;
> print "There are $n elements in the piddle\n";
>
> I get the following --
>
> size of pdl is: 8
> Type: Double Dim: D [57,7300,13] State: P
> There are 54093000 elements in the piddle
>
> Makes sense so far, but what does that "size of pdl is: 8" mean?
> Surely, that is not the number of bytes being used by this data
> structure?
Of course not.  The docs say that howbig 'Returns the size of a piddle 
datatype in bytes.'  You have a piddle of type double.  Doubles take 8 
bytes each.


>  By my calculations, the data structure weighs in at about
> 450 KB packed as a Storable object. By the way... in the pseudo code
> above, I have shown the number of elements in the arrays, not the
> actual values. So, for example, in each of the 'd' arrays, there are
> 33 elements, but only about 4 or 5 of them are INTEGERS, the rest
> being REAL numbers. This is useful to get a sense of the size of the
> data structure.
>   
Perhaps useful to people, but not so useful to PDL.  If you have a 
five-element piddle and 4 elements are integers and 1 is a double, then 
the whole thing is promoted to double.  The efficiency of PDL is derived 
mainly from knowing the byte-size of the elements of a piddle a priori.  
If you want to mix ints and doubles like this, you probably need to 
rethink your data structure.  You can use plain old Perl lists, which 
don't require uniform typing, but the overhead will probably kill you.  
Hashes or lists of piddles is also an option to consider.
> Now, this data structure is the data for computation that is applied
> to a large array, say, 1000 x 1000 or even 1500 x 1800, so between a
> million to a couple of million or more elements, on a cell by cell
> basis. Imagine applying f(d) to the array where d is data structure,
> with f(d) being applied to each cell individually.
>
> Curious to test the limits of my machine and PDL, I tried to create a
> piddle that held 1000_000 such structures, I got a 'bus error'. At an
> array with 100 elements, I got a segmentation fault. At an array with
> 10 elements, it worked.
>   
And probably with the 10^6 example you got a computer brought to its 
knees trying to allocate 40 TB of memory.  If I understand you correctly 
(let me know if I don't), you want to create a super-piddle that is (to 
use your examples here) 10^6 by 57 by 7300 by 13.  Simple calculation 
shows that the base piddle $pdl is 41 MB, so if you want a million of 
these you need 41 million MB of memory somewhere.  10 of those is not 
such a big problem, 100 might work if you have several GB of memory, but 
10^6 is just crazy.  Probably need to rethink how you're doing things there.

Derek

> I am seeking some suggestions on how to work with such data using PDL.
>
> Many thanks,
>
>
>
>   


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL and large data structure per cell in a large array

Reply via email to