Gigantic file size processing error
Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. regards Manikandan
Re: Gigantic file size processing error
On Thu, 2 Jan 2014 23:21:22 +0800 (SGT) mani kandan mani_nm...@yahoo.com wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Firstly, be specific - come out with error doesn't help us - what is the error? Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. Cheers Dave P -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 10:39 AM, David Precious wrote: On Thu, 2 Jan 2014 23:21:22 +0800 (SGT) mani kandan mani_nm...@yahoo.com wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Firstly, be specific - come out with error doesn't help us - what is the error? Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. slurping in 500MB is not a smart thing unless you have many gigs of free ram. otherwise it will just be going to disk on the swap and you don't gain much other than simpler logic. but i agree, knowing the error message and who is generating it will be valuable. it could be a virtual ram limitation on the OS which can be changed with the ulimit utility (or BSD::Resource if you have that module). uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 11:18:31 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 10:39 AM, David Precious wrote: Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. Oh - my mistake, I'd believed that edit_file_lines edited the file line-by-line, writing the results to a temporary file and then renaming the temporary file over the original at the end. In that case, I think the docs are a little unclear: These subs read in a file into $_, execute a code block which should modify $_ and then write $_ back to the file. The difference between them is that edit_file reads the whole file into $_ and calls the code block one time. With edit_file_lines each line is read into $_ and the code is called for each line... and These subs are the equivalent of the -pi command line options of Perl... ... to me, that sounds like edit_file_lines reads a line at a time rather than slurping the whole lot - but looking at the code, it does indeed read the entire file contents into RAM. (I probably should have expected anything in File::Slurp to, well, slurp the file... :) ) Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) OP - what didn't work about Tie::File? -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 11:48 AM, David Precious wrote: On Thu, 02 Jan 2014 11:18:31 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 10:39 AM, David Precious wrote: Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. Oh - my mistake, I'd believed that edit_file_lines edited the file line-by-line, writing the results to a temporary file and then renaming the temporary file over the original at the end. In that case, I think the docs are a little unclear: These subs read in a file into $_, execute a code block which should modify $_ and then write $_ back to the file. The difference between them is that edit_file reads the whole file into $_ and calls the code block one time. With edit_file_lines each line is read into $_ and the code is called for each line... good point. i should emphasize that it does slurp in the file. tie::file only reads in chunks and moves around as you access elements. edit_file_lines slurps into an array and loops over those elements aliasing each one to $_. it definitely eats its own dog food! and These subs are the equivalent of the -pi command line options of Perl... ... to me, that sounds like edit_file_lines reads a line at a time rather than slurping the whole lot - but looking at the code, it does indeed read the entire file contents into RAM. (I probably should have expected anything in File::Slurp to, well, slurp the file... :) ) as i said, dog food is good! :) i wrote edit_file and edit_file_lines as interesting wrappers around read_file and write_file. i assumed it was obvious they used those slurp functions. Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 11:56:26 -0500 Uri Guttman u...@stemsystems.com wrote: Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 12:08 PM, David Precious wrote: On Thu, 02 Jan 2014 11:56:26 -0500 Uri Guttman u...@stemsystems.com wrote: Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 12:33 PM, David Precious wrote: On Thu, 02 Jan 2014 12:19:16 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 12:08 PM, David Precious wrote: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file [...] it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) Righto - I'll add it to my list of things awaiting tuit resupply :) who is your tuit supplier? i am looking for a better and cheaper one. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 12:19:16 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 12:08 PM, David Precious wrote: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file [...] it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) Righto - I'll add it to my list of things awaiting tuit resupply :) -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Recursive Validation Function
Hello again, I wasn't able to continue with the project I was working on way back in September, but I wanted to send a quick note to thank everyone (Andy Bach, Jim Gibson, Rob Dixon and Mike Flannigan) for their help. I apologize for not following up sooner. Hopefully I'll have some time to get back to it some day soon; I was a little optimistic in my planning. Thanks again! On Sep 4, 2013, at 10:55 PM, beginners-digest-h...@perl.org wrote: beginners Digest 5 Sep 2013 03:55:27 - Issue 4576 Topics (messages 123410 through 123433): Re: Recursive Validation Function 123410 by: Andy Bach 123411 by: Jim Gibson 123412 by: Rob Dixon 123413 by: Rob Dixon 123427 by: Mike Flannigan -- From: Andy Bach afb...@gmail.com Date: September 2, 2013 8:43:59 PM CDT To: John Aten welcome.to.eye.o.r...@gmail.com Cc: beginners@perl.org beginners@perl.org Subject: Re: Recursive Validation Function On Monday, September 2, 2013, John Aten wrote: my $valid_token = validate_tokens($token); Too bad it doesn't work! Even if I put in valid tokens on the first shot, there are errors: You're passing the token as a parameter to the sub but the is using $_ for the match. You need to assign the parameter in the sub. They arrive on the global @_ array; one idiom is sub validate_token { my ($test_token) = @_; This assigns the first passed parameter to test_token. The advantage being of you add more params you can just insert a new var. in the LHS list: my ($test_token, $empty_allowed, $cleanup) = @_; Now use $test_token instead of $_ I am writing a script to rename files. The script prompts the user to enter tokens which will comprise part of the file name. These are made of letters and numbers separated by a dot, ie: R.3. The letters can be R, I or C, upper or lowercase. The first number can be one through eight, and if it is a six there can be an additional dot followed by a one or two. To make sure that I don't make any mistakes when putting in these tokens, I have tried to write a function to make sure that they are valid: sub validate_tokens { if ($_ !~ /^[ric]\.[1234578]$/i) { if ($_ !~ /^[ric]\.6(\.[1|2])?$/i) { print Enter valid tokens: ; my $new_token = STDIN; chomp $new_token; validate_tokens($new_token); } } return $_; } I'm calling it like this: print Enter Tokens: ; my $token = STDIN; chomp $token; my $valid_token = validate_tokens($token); Too bad it doesn't work! Even if I put in valid tokens on the first shot, there are errors: Use of uninitialized value $_ in pattern match (m//) at bulk.pl line 65, STDIN line 3. Use of uninitialized value $_ in pattern match (m//) at bulk.pl line 66, STDIN line 3. Lines 65 and 66 are the ones with the two if statements, lines 2 and 3 of the first code snippet above. Each time I run it, the STDIN line number in the error message increases by one. The script goes into an infinite loop, prompting for valid tokens, but even if valid ones are entered it continues. What am I missing? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk From: Jim Gibson jimsgib...@gmail.com Date: September 3, 2013 12:56:16 AM CDT To: begin Perl Beginners beginners@perl.org Subject: Re: Recursive Validation Function On Sep 2, 2013, at 6:23 PM, John Aten wrote: Hello all, I am writing a script to rename files. The script prompts the user to enter tokens which will comprise part of the file name. These are made of letters and numbers separated by a dot, ie: R.3. The letters can be R, I or C, upper or lowercase. The first number can be one through eight, and if it is a six there can be an additional dot followed by a one or two. To make sure that I don't make any mistakes when putting in these tokens, I have tried to write a function to make sure that they are valid: Use $_[0] instead of $_ or the other suggestions from Andy. sub validate_tokens { if ($_ !~ /^[ric]\.[1234578]$/i) { if ($_ !~ /^[ric]\.6(\.[1|2])?$/i) { The '[1|2]' part of this regular expression is incorrect. Characters in a character class do not need the pipe character for alternation. They are already treated as alternates. This part should be '[12]'. Your pattern will match the string 'r.6|' You can combine the two regular expressions by using alternation and non-capturing grouping (clustering): if( $_[0] !~ /^[ric]\.(?:[1-578])|(?:6(\.[12])?)$/i ) {