Just a quick note as I leave work. Please don't use DATA as a handle to a file! If you must use a bareword handle please choose any other one. DATA is magical.
Cheers, Joel On Tue, Feb 14, 2012 at 3:36 PM, MARK BAKER <[email protected]> wrote: > Hey Clifford > I was talking with David about along the same lines that > your problem involves here, and I would like to share something with you > that I think might make its way into the PDL core in one way or another > > I found a way to load 3 gigabytes of data in only 40 megabytes of ram > like this ... > ################################################## > {open(DATA, "large_file_path_here");} > {my @offset;} > {$offset[1] = tell DATA;} > {my $line_num = 2;} > {while (<DATA>) { > {$offset[$line_num++] = tell DATA;} > }} > print "DONE please enter a line number"; > { while (my $entered = <STDIN>) { > { seek DATA, $offset[$entered], 0; > $line = <DATA>; > print $line,"\n"; > } > }} > ################################################### > > try this out with a large file > enter a line number and it will bring up that > the information on that line very fast.. > > the trick here is to put your file data into decimal > then to pack the file data like this > ################################################## > $data = pack "w*", $_; > ################################################# > then just use unpack to view the numerical data > what this does is instead of calling each line, is it calls > by each paragraph of numerical data which will allow you > to save a lot of RAM by calling in the information in chunks > instead of by line > > hope that helps > > -Mark > > > > ________________________________ > From: Clifford Sobchuk <[email protected]> > To: Chris Marshall <[email protected]> > Cc: "[email protected]" <[email protected]> > Sent: Tuesday, February 14, 2012 12:36 PM > > Subject: Re: [Perldl] How to find out cause of out of memory > > Thanks all. Pre-allocating isn't obvious (to me) as the file and hence data > are highly variable with no easy way to determine the size. > I do think that it is the conversion from perl array to pdl as I am guessing > that the entire perl array would have to be loaded - which likely causes the > out of memory. In the "Whirlwind Tour" book there was an example showing how > to assign an image to a hash with the elements being pdls. > > I am unsure how to do this with rgrep or rcols. I have tried: > > open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n"; > $fwdGain = rgrep {/\s\d\s+\d+\s+\d\d\d\s+\w+\s+(\d+)\s+\-\d+/} $in; > open ($in, "<$ARGV[0]") or die "can't open ARGV[0] $!\n"; > %snr = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in; > > To import the data - but it doesn't work as the $1 is a word. I made a map > for it as > my %rate = { "Full"=>1, "Half"=>0.5, "Quarter"=>0.25, "Eighth"=>0.125 }; > > And tried to use: > $snr{$rate{$1}} = rgrep {/\s\d\s+\d+\s+\d\d\d\s+(\w+)\s+\d+\s+(\-\d+)/} $in; > > But this doesn't work either as it seems that rgrep is looking for a numeric > value. > Argument "Eighth" isn't numeric in multiplication (*) at > C:\strawberry\perl\site\... > > Is there a way to use rgrep to put the mapped numeric and the data in to a > hash? > > Thanks, > > CLIFF SOBCHUK > Core RF Engineering > Phone 613-667-1974 ecn: 8109-71974 > mobile 403-819-9233 > yahoo: sobchuk > www.ericsson.com > > "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who is > solely responsible for this email and its contents. All inquiries regarding > this email should be addressed to Ericsson. The web site for Ericsson is > www.ericsson.com." > > This Communication is Confidential. We only send and receive email on the > basis of the terms set out at www.ericsson.com/email_disclaimer > > > -----Original Message----- > From: Chris Marshall [mailto:[email protected]] > Sent: Tuesday, February 14, 2012 10:42 AM > To: Clifford Sobchuk > Cc: David Mertens; [email protected] > Subject: Re: [Perldl] How to find out cause of out of memory > > Another angle, I can't tell how much of the data you collect in the perl > hash structures but they are *much* more memory intensive than the pdl data > arrays. > > Your best chance would be to allocate the destination pdl and then use slice > assignments to put the hash data into its correct place. > > Beware, one issue with perl is that it dies if it runs out of memory which > is a pain. If you preallocate the big piddle, then maybe you'll get the > crash in the perl code which could give you an idea where the memory use is. > > --Chris > > On Tue, Feb 14, 2012 at 11:22 AM, David Mertens <[email protected]> > wrote: >> Cliff - >> >> Has your client given you with some sample data so that you can try to >> reproduce the error on your own machine? If so, a collection of >> warnings dumped to a logfile might at least tell you which line of code is >> croaking. >> >> Allocation of large piddles (many hundreds of megabytes) has been >> reported to be a problem elsewhere. One thing I have done on Linux to >> work around this problem is to build a FastRaw file piece-by-piece, >> then memory-mapping the file. Although this is not a possibility on >> Windows (no PDL support for memory mapping on windows yet), it might >> provide a means for a solution. You could build a piddle into a >> FastRaw file with one script, then have a different script try to >> readfraw that file. If you pull in this file early in your (second) >> Perl process, you have a higher likelihood of getting the contiguous >> memory request that PDL needs for the large data array. >> >> I know, it's not ideal, but I hope that helps. I should probably try >> to figure out how to add memory mapping support to Windows and then >> document this technique so that others can use it. >> >> For building the FastRaw file, I can dig up some sample code and send >> it along if that would help, but I won't be able to get to it until >> tonight at the earliest (and I make no guarantees as it's Valentine's >> day :-) >> >> David >> >> >> On Tue, Feb 14, 2012 at 9:26 AM, Clifford Sobchuk >> <[email protected]> wrote: >>> >>> Hi Folks, >>> >>> I am running in to a problem where I am putting in a large amount of >>> data (variable depending on log size). The data is being pushed in to >>> a perl array, and then converted in to a piddle. I think that it >>> might be the conversion from perl array to piddle, but am not sure. >>> How can I find out where the issue exists and correct it. The end >>> users computer (laptop) will often be in this situation apparently. >>> Since the data is intermixed with text that needs to be used to hash >>> each specific attribute, I can't simply use an rgrep or rcols import. >>> I can use rcols for each section, this would result in using glue to >>> build up the piddle slowly (groups of 20 to 100 - depending on the datum >>> for that attribute). >>> >>> Example pseudo code. >>> Foreach line { >>> $index1 = $1 if (/index1:\s(\d+)\w+); >>> $index2 ... >>> if $datastart && ! $dataend { >>> push @{$myhash{$index1}{$index2}{datum1}},$1 if >>> (/mydata/); >>> $dataend = 1 if (/$eod/); >>> } >>> Foreach sort(keys %myhash) { >>> ....for each index >>> $data1=pdl(@{$myhash{$index1}{$index2}{datum1}}); >>> } >>> } >>> >>> The raw text files are on the order of 0.5 to 14 GB and are being run >>> on >>> win32 (vista - which I know has a 2GB limit for applications). Hope >>> that this provides enough information to scope the issue. >>> >>> Thanks, >>> >>> >>> CLIFF SOBCHUK >>> Ericsson >>> Core RF Engineering >>> Calgary, AB, Canada >>> Phone 613-667-1974 ECN 8109 x71974 >>> Mobile 403-819-9233 >>> [email protected]<mailto:[email protected]> >>> yahoo: sobchuk >>> http://www.ericsson.com/ >>> >>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), >>> who is solely responsible for this email and its contents. All >>> inquiries regarding this email should be addressed to Ericsson. The >>> web site for Ericsson is www.ericsson.com." >>> >>> This Communication is Confidential. We only send and receive email on >>> the basis of the terms set out at >>> www.ericsson.com/email_disclaimer<http://www.ericsson.com/email_discl >>> aimer> >>> >>> >>> >>> _______________________________________________ >>> Perldl mailing list >>> [email protected] >>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> >> >> >> >> -- >> "Debugging is twice as hard as writing the code in the first place. >> Therefore, if you write the code as cleverly as possible, you are, >> by definition, not smart enough to debug it." -- Brian Kernighan >> >> >> _______________________________________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > > > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
