Re: [Perldl] How to find out cause of out of memory

Joel Berger Tue, 14 Feb 2012 14:03:55 -0800

Just a quick note as I leave work. Please don't use DATA as a handle
to a file! If you must use a bareword handle please choose any other
one. DATA is magical.


Cheers,
Joel

On Tue, Feb 14, 2012 at 3:36 PM, MARK BAKER <[email protected]> wrote:
> Hey Clifford
> I was talking with David about along the same lines that
> your problem involves here, and I would like to share something with you
> that I think might make its way into the PDL core in one way or another
>
> I found a way to load 3 gigabytes of data in only 40 megabytes of ram
> like this ...
> ##################################################
> {open(DATA, "large_file_path_here");}
>   {my @offset;}
>   {$offset[1] = tell DATA;}
>  {my $line_num = 2;}
>  {while (<DATA>) {
>  {$offset[$line_num++] = tell DATA;}
>    }}
> print "DONE please enter a line number";
>     { while (my $entered = <STDIN>) {
>      {  seek DATA, $offset[$entered], 0;
>       $line = <DATA>;
>     print $line,"\n";
>     }
>      }}
> ###################################################
>
> try this out with a large file
> enter a line number and it will bring up that
> the information on that line very fast..
>
> the trick here is to put your file data into decimal
> then to pack the file data like this
> ##################################################
> $data = pack "w*", $_;
> #################################################
> then just use unpack to view the numerical data
> what this does is instead of calling each line, is it calls
> by each paragraph of numerical data which will allow you
> to save a lot of RAM by calling in the information in chunks
> instead of by line
>
> hope that helps
>
> -Mark
>
>
>
> ________________________________
> From: Clifford Sobchuk <[email protected]>
> To: Chris Marshall <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Sent: Tuesday, February 14, 2012 12:36 PM
>
> Subject: Re: [Perldl] How to find out cause of out of memory
>
> Thanks all. Pre-allocating isn't obvious (to me) as the file and hence data
> are highly variable with no easy way to determine the size.
> I do think that it is the conversion from perl array to pdl as I am guessing
> that the entire perl array would have to be loaded - which likely causes the
> out of memory. In the "Whirlwind Tour" book there was an example showing how
> to assign an image to a hash with the elements being pdls.
>
> I am unsure how to do this with rgrep or rcols. I have tried:
>
> open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n";
> $fwdGain = rgrep {/\s\d\s+\d+\s+\d\d\d\s+\w+\s+(\d+)\s+\-\d+/} $in;
> open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n";
> %snr = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in;
>
> To import the data - but it doesn't work as the $1 is a word. I made a map
> for it as
> my %rate = { "Full"=>1, "Half"=>0.5, "Quarter"=>0.25, "Eighth"=>0.125 };
>
> And tried to use:
> $snr{$rate{$1}} = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in;
>
> But this doesn't work either as it seems that rgrep is looking for a numeric
> value.
> Argument "Eighth" isn't numeric in multiplication (*) at
> C:\strawberry\perl\site\...
>
> Is there a way to use rgrep to put the mapped numeric and the data in to a
> hash?
>
> Thanks,
>
> CLIFF SOBCHUK
> Core RF Engineering
> Phone 613-667-1974  ecn: 8109-71974
> mobile 403-819-9233
> yahoo: sobchuk
> www.ericsson.com
>
> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who is
> solely responsible for this email and its contents. All inquiries regarding
> this email should be addressed to Ericsson. The web site for Ericsson is
> www.ericsson.com."
>
> This Communication is Confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_disclaimer
>
>
> -----Original Message-----
> From: Chris Marshall [mailto:[email protected]]
> Sent: Tuesday, February 14, 2012 10:42 AM
> To: Clifford Sobchuk
> Cc: David Mertens; [email protected]
> Subject: Re: [Perldl] How to find out cause of out of memory
>
> Another angle, I can't tell how much of the data you collect in the perl
> hash structures but they are *much* more memory intensive than the pdl data
> arrays.
>
> Your best chance would be to allocate the destination pdl and then use slice
> assignments to put the hash data into its correct place.
>
> Beware, one issue with perl is that it dies if it runs out of memory which
> is a pain.  If you preallocate the big piddle, then maybe you'll get the
> crash in the perl code which could give you an idea where the memory use is.
>
> --Chris
>
> On Tue, Feb 14, 2012 at 11:22 AM, David Mertens <[email protected]>
> wrote:
>> Cliff -
>>
>> Has your client given you with some sample data so that you can try to
>> reproduce the error on your own machine? If so, a collection of
>> warnings dumped to a logfile might at least tell you which line of code is
>> croaking.
>>
>> Allocation of large piddles (many hundreds of megabytes) has been
>> reported to be a problem elsewhere. One thing I have done on Linux to
>> work around this problem is to build a FastRaw file piece-by-piece,
>> then memory-mapping the file. Although this is not a possibility on
>> Windows (no PDL support for memory mapping on windows yet), it might
>> provide a means for a solution. You could build a piddle into a
>> FastRaw file with one script, then have a different script try to
>> readfraw that file. If you pull in this file early in your (second)
>> Perl process, you have a higher likelihood of getting the contiguous
>> memory request that PDL needs for the large data array.
>>
>> I know, it's not ideal, but I hope that helps. I should probably try
>> to figure out how to add memory mapping support to Windows and then
>> document this technique so that others can use it.
>>
>> For building the FastRaw file, I can dig up some sample code and send
>> it along if that would help, but I won't be able to get to it until
>> tonight at the earliest (and I make no guarantees as it's Valentine's
>> day :-)
>>
>> David
>>
>>
>> On Tue, Feb 14, 2012 at 9:26 AM, Clifford Sobchuk
>> <[email protected]> wrote:
>>>
>>> Hi Folks,
>>>
>>> I am running in to a problem where I am putting in a large amount of
>>> data (variable depending on log size). The data is being pushed in to
>>> a perl array, and then converted in to a piddle. I think that it
>>> might be the conversion from perl array to piddle, but am not sure.
>>> How can I find out where the issue exists and correct it. The end
>>> users computer (laptop) will often be in this situation apparently.
>>> Since the data is intermixed with text that needs to be used to hash
>>> each specific attribute, I can't simply use an rgrep or rcols import.
>>> I can use rcols for each section, this would result in using glue to
>>> build up the piddle slowly (groups of 20 to 100 - depending on the datum
>>> for that attribute).
>>>
>>> Example pseudo code.
>>> Foreach line {
>>>        $index1 = $1 if (/index1:\s(\d+)\w+);
>>>        $index2 ...
>>>        if $datastart && ! $dataend {
>>>                push @{$myhash{$index1}{$index2}{datum1}},$1 if
>>> (/mydata/);
>>>                $dataend = 1 if (/$eod/);
>>>        }
>>> Foreach sort(keys %myhash) {
>>>        ....for each index
>>>                $data1=pdl(@{$myhash{$index1}{$index2}{datum1}});
>>>        }
>>> }
>>>
>>> The raw text files are on the order of 0.5 to 14 GB and are being run
>>> on
>>> win32 (vista - which I know has a 2GB limit for applications). Hope
>>> that this provides enough information to scope the issue.
>>>
>>> Thanks,
>>>
>>>
>>> CLIFF SOBCHUK
>>> Ericsson
>>> Core RF Engineering
>>> Calgary, AB, Canada
>>> Phone 613-667-1974  ECN 8109 x71974
>>> Mobile 403-819-9233
>>> [email protected]<mailto:[email protected]>
>>> yahoo: sobchuk
>>> http://www.ericsson.com/
>>>
>>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"),
>>> who is solely responsible for this email and its contents. All
>>> inquiries regarding this email should be addressed to Ericsson. The
>>> web site for Ericsson is www.ericsson.com."
>>>
>>> This Communication is Confidential. We only send and receive email on
>>> the basis of the terms set out at
>>> www.ericsson.com/email_disclaimer<http://www.ericsson.com/email_discl
>>> aimer>
>>>
>>>
>>>
>>> _______________________________________________
>>> Perldl mailing list
>>> [email protected]
>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>>
>>
>> --
>>  "Debugging is twice as hard as writing the code in the first place.
>>   Therefore, if you write the code as cleverly as possible, you are,
>>   by definition, not smart enough to debug it." -- Brian Kernighan
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] How to find out cause of out of memory

Reply via email to