Gigantic file size processing error

2014-01-02 Thread mani kandan
Hi,

We have file size of huge size 500MB, Need to Manipulate the file, some 
replacement and then write the file, I have used File::slurp and works for file 
size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing 
and come out with error. I have also used Tie::file module same case as not 
processing, any guidance.

regards
Manikandan

Re: Gigantic file size processing error

2014-01-02 Thread David Precious
On Thu, 2 Jan 2014 23:21:22 +0800 (SGT)
mani kandan mani_nm...@yahoo.com wrote:

 Hi,
 
 We have file size of huge size 500MB, Need to Manipulate the file,
 some replacement and then write the file, I have used File::slurp and
 works for file size of 300MB (Thanks Uri) but for this huge size
 500MB it is not processing and come out with error. I have also used
 Tie::file module same case as not processing, any guidance.

Firstly, be specific - come out with error doesn't help us - what is
the error?

Secondly - do you need to work on the file as a whole, or can you just
loop over it, making changes, and writing them back out?  In other
words, do you *need* to hold the whole file in memory at one time?
More often than not, you don't.

If it's per-line changes, then File::Slurp::edit_file_lines should work
- for e.g.:

  use File::Slurp qw(edit_file_lines);
  my $filename = '/tmp/foo';
  edit_file_lines(sub { s/badger/mushroom/g }, $filename);

The above would of course replace every occurrence of 'badger' with
'mushroom' in the file.

Cheers

Dave P


-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread Uri Guttman

On 01/02/2014 10:39 AM, David Precious wrote:

On Thu, 2 Jan 2014 23:21:22 +0800 (SGT)
mani kandan mani_nm...@yahoo.com wrote:


Hi,

We have file size of huge size 500MB, Need to Manipulate the file,
some replacement and then write the file, I have used File::slurp and
works for file size of 300MB (Thanks Uri) but for this huge size
500MB it is not processing and come out with error. I have also used
Tie::file module same case as not processing, any guidance.


Firstly, be specific - come out with error doesn't help us - what is
the error?

Secondly - do you need to work on the file as a whole, or can you just
loop over it, making changes, and writing them back out?  In other
words, do you *need* to hold the whole file in memory at one time?
More often than not, you don't.

If it's per-line changes, then File::Slurp::edit_file_lines should work
- for e.g.:

   use File::Slurp qw(edit_file_lines);
   my $filename = '/tmp/foo';
   edit_file_lines(sub { s/badger/mushroom/g }, $filename);

The above would of course replace every occurrence of 'badger' with
'mushroom' in the file.


if there is a size issue, that would be just as bad as slurping in the 
whole file and it would use even more storage as it will be an array of 
all the lines internally. slurping in 500MB is not a smart thing unless 
you have many gigs of free ram. otherwise it will just be going to disk 
on the swap and you don't gain much other than simpler logic.


but i agree, knowing the error message and who is generating it will be 
valuable. it could be a virtual ram limitation on the OS which can be 
changed with the ulimit utility (or BSD::Resource if you have that module).


uri


--
Uri Guttman - The Perl Hunter
The Best Perl Jobs, The Best Perl Hackers
http://PerlHunter.com

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread David Precious
On Thu, 02 Jan 2014 11:18:31 -0500
Uri Guttman u...@stemsystems.com wrote:
 On 01/02/2014 10:39 AM, David Precious wrote:

  Secondly - do you need to work on the file as a whole, or can you
  just loop over it, making changes, and writing them back out?  In
  other words, do you *need* to hold the whole file in memory at one
  time? More often than not, you don't.
 
  If it's per-line changes, then File::Slurp::edit_file_lines should
  work
  - for e.g.:
 
 use File::Slurp qw(edit_file_lines);
 my $filename = '/tmp/foo';
 edit_file_lines(sub { s/badger/mushroom/g }, $filename);
 
  The above would of course replace every occurrence of 'badger' with
  'mushroom' in the file.
 
 if there is a size issue, that would be just as bad as slurping in
 the whole file and it would use even more storage as it will be an
 array of all the lines internally.

Oh - my mistake, I'd believed that edit_file_lines edited the file
line-by-line, writing the results to a temporary file and then
renaming the temporary file over the original at the end.

In that case, I think the docs are a little unclear:

These subs read in a file into $_, execute a code block which should
modify $_ and then write $_ back to the file. The difference between
them is that edit_file reads the whole file into $_ and calls the code
block one time. With edit_file_lines each line is read into $_ and the
code is called for each line...

and 

These subs are the equivalent of the -pi command line options of
Perl...

... to me, that sounds like edit_file_lines reads a line at a time
rather than slurping the whole lot - but looking at the code, it does
indeed read the entire file contents into RAM.  (I probably should have
expected anything in File::Slurp to, well, slurp the file... :) )

Part of me wonders if File::Slurp should provide an in-place (not
slurping into RAM) editing feature which works like edit_file_lines but
line-by-line using a temp file, but that's probably feature creep :)

OP - what didn't work about Tie::File?



-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread Uri Guttman

On 01/02/2014 11:48 AM, David Precious wrote:

On Thu, 02 Jan 2014 11:18:31 -0500
Uri Guttman u...@stemsystems.com wrote:

On 01/02/2014 10:39 AM, David Precious wrote:



Secondly - do you need to work on the file as a whole, or can you
just loop over it, making changes, and writing them back out?  In
other words, do you *need* to hold the whole file in memory at one
time? More often than not, you don't.

If it's per-line changes, then File::Slurp::edit_file_lines should
work
- for e.g.:

use File::Slurp qw(edit_file_lines);
my $filename = '/tmp/foo';
edit_file_lines(sub { s/badger/mushroom/g }, $filename);

The above would of course replace every occurrence of 'badger' with
'mushroom' in the file.


if there is a size issue, that would be just as bad as slurping in
the whole file and it would use even more storage as it will be an
array of all the lines internally.


Oh - my mistake, I'd believed that edit_file_lines edited the file
line-by-line, writing the results to a temporary file and then
renaming the temporary file over the original at the end.

In that case, I think the docs are a little unclear:

These subs read in a file into $_, execute a code block which should
modify $_ and then write $_ back to the file. The difference between
them is that edit_file reads the whole file into $_ and calls the code
block one time. With edit_file_lines each line is read into $_ and the
code is called for each line...



good point. i should emphasize that it does slurp in the file. tie::file 
only reads in chunks and moves around as you access elements. 
edit_file_lines slurps into an array and loops over those elements 
aliasing each one to $_. it definitely eats its own dog food!



and

These subs are the equivalent of the -pi command line options of
Perl...

... to me, that sounds like edit_file_lines reads a line at a time
rather than slurping the whole lot - but looking at the code, it does
indeed read the entire file contents into RAM.  (I probably should have
expected anything in File::Slurp to, well, slurp the file... :) )


as i said, dog food is good! :)

i wrote edit_file and edit_file_lines as interesting wrappers around 
read_file and write_file. i assumed it was obvious they used those slurp 
functions.





Part of me wonders if File::Slurp should provide an in-place (not
slurping into RAM) editing feature which works like edit_file_lines but
line-by-line using a temp file, but that's probably feature creep :)


that IS tie::file which i didn't want for efficiency reasons. it has to 
read/write back and forth every time you modify an element. edit_file 
(and _lines) are meant to be fast and simple to use for common editing 
of files. as with slurping, i didn't expect them to be used on .5GB 
files! :)


uri



--
Uri Guttman - The Perl Hunter
The Best Perl Jobs, The Best Perl Hackers
http://PerlHunter.com

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread David Precious
On Thu, 02 Jan 2014 11:56:26 -0500
Uri Guttman u...@stemsystems.com wrote:

  Part of me wonders if File::Slurp should provide an in-place (not
  slurping into RAM) editing feature which works like edit_file_lines
  but line-by-line using a temp file, but that's probably feature
  creep :)  
 
 that IS tie::file which i didn't want for efficiency reasons. it has
 to read/write back and forth every time you modify an element.
 edit_file (and _lines) are meant to be fast and simple to use for
 common editing of files. as with slurping, i didn't expect them to be
 used on .5GB files! :)

Oh, I was thinking of a wrapper that would:

(a) open a new temp file
(b) iterate over the source file, line-by-line, calling the provided
coderef for each line
(c) write $_ (potentially modified by the coderef) to the temp file
(d) finally, rename the temp file over the source file

Of course, it's pretty easy to write such code yourself, and as it
doesn't slurp the file in, it could be considered out of place in
File::Slurp.  I'd be fairly surprised if such a thing doesn't already
exist on CPAN, too.  (If it didn't, I might actually write such a
thing, as a beginner-friendly here's how to easily modify a file, line
by line, with minimal effort offering.)


-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread Uri Guttman

On 01/02/2014 12:08 PM, David Precious wrote:

On Thu, 02 Jan 2014 11:56:26 -0500
Uri Guttman u...@stemsystems.com wrote:


Part of me wonders if File::Slurp should provide an in-place (not
slurping into RAM) editing feature which works like edit_file_lines
but line-by-line using a temp file, but that's probably feature
creep :)


that IS tie::file which i didn't want for efficiency reasons. it has
to read/write back and forth every time you modify an element.
edit_file (and _lines) are meant to be fast and simple to use for
common editing of files. as with slurping, i didn't expect them to be
used on .5GB files! :)


Oh, I was thinking of a wrapper that would:

(a) open a new temp file
(b) iterate over the source file, line-by-line, calling the provided
coderef for each line
(c) write $_ (potentially modified by the coderef) to the temp file
(d) finally, rename the temp file over the source file

Of course, it's pretty easy to write such code yourself, and as it
doesn't slurp the file in, it could be considered out of place in
File::Slurp.  I'd be fairly surprised if such a thing doesn't already
exist on CPAN, too.  (If it didn't, I might actually write such a
thing, as a beginner-friendly here's how to easily modify a file, line
by line, with minimal effort offering.)




it wouldn't be a bad addition to file::slurp. call it something like 
edit_file_loop. if you write it, i will add it to the module. you can 
likely steal the code from edit_file_lines and modify that. i would 
document it as an alternative to edit_file_lines for very large files.


it will need pod, test files and good comments for me to add it. credit 
will be given :)


thanx,

uri

--
Uri Guttman - The Perl Hunter
The Best Perl Jobs, The Best Perl Hackers
http://PerlHunter.com

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread Uri Guttman

On 01/02/2014 12:33 PM, David Precious wrote:

On Thu, 02 Jan 2014 12:19:16 -0500
Uri Guttman u...@stemsystems.com wrote:

On 01/02/2014 12:08 PM, David Precious wrote:

Oh, I was thinking of a wrapper that would:

(a) open a new temp file
(b) iterate over the source file, line-by-line, calling the provided
coderef for each line
(c) write $_ (potentially modified by the coderef) to the temp file
(d) finally, rename the temp file over the source file


[...]

it wouldn't be a bad addition to file::slurp. call it something like
edit_file_loop. if you write it, i will add it to the module. you can
likely steal the code from edit_file_lines and modify that. i would
document it as an alternative to edit_file_lines for very large files.

it will need pod, test files and good comments for me to add it.
credit will be given :)


Righto - I'll add it to my list of things awaiting tuit resupply :)



who is your tuit supplier? i am looking for a better and cheaper one.

uri


--
Uri Guttman - The Perl Hunter
The Best Perl Jobs, The Best Perl Hackers
http://PerlHunter.com

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Gigantic file size processing error

2014-01-02 Thread David Precious
On Thu, 02 Jan 2014 12:19:16 -0500
Uri Guttman u...@stemsystems.com wrote:
 On 01/02/2014 12:08 PM, David Precious wrote:
  Oh, I was thinking of a wrapper that would:
 
  (a) open a new temp file
  (b) iterate over the source file, line-by-line, calling the provided
  coderef for each line
  (c) write $_ (potentially modified by the coderef) to the temp file
  (d) finally, rename the temp file over the source file

[...]
 it wouldn't be a bad addition to file::slurp. call it something like 
 edit_file_loop. if you write it, i will add it to the module. you can 
 likely steal the code from edit_file_lines and modify that. i would 
 document it as an alternative to edit_file_lines for very large files.
 
 it will need pod, test files and good comments for me to add it.
 credit will be given :)

Righto - I'll add it to my list of things awaiting tuit resupply :)



-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Recursive Validation Function

2014-01-02 Thread John Aten
Hello again,

I wasn't able to continue with the project I was working on way back in 
September, but I wanted to send a quick note to thank everyone (Andy Bach, Jim 
Gibson, Rob Dixon and Mike Flannigan) for their help. I apologize for not 
following up sooner. Hopefully I'll have some time to get back to it some day 
soon; I was a little optimistic in my planning.  Thanks again!

On Sep 4, 2013, at 10:55 PM, beginners-digest-h...@perl.org wrote:

 
 beginners Digest 5 Sep 2013 03:55:27 - Issue 4576
 
 Topics (messages 123410 through 123433):
 
 Re: Recursive Validation Function
   123410 by: Andy Bach
   123411 by: Jim Gibson
   123412 by: Rob Dixon
   123413 by: Rob Dixon
   123427 by: Mike Flannigan
 
 
 --
 
 From: Andy Bach afb...@gmail.com
 Date: September 2, 2013 8:43:59 PM CDT
 To: John Aten welcome.to.eye.o.r...@gmail.com
 Cc: beginners@perl.org beginners@perl.org
 Subject: Re: Recursive Validation Function
 
 
 
 
 On Monday, September 2, 2013, John Aten wrote:
 my $valid_token = validate_tokens($token);
 
 
 Too bad it doesn't work! Even if I put in valid tokens on the first shot, 
 there are errors:
 
 You're passing the token as a parameter to the sub but the is using $_ for 
 the match. You need to assign the parameter in the sub. They arrive on the 
 global @_ array; one idiom is
 sub validate_token {
   my ($test_token) = @_;
   
 This assigns the first passed parameter to test_token. The advantage being of 
 you add more params you can just insert a new var. in the LHS list:
   my ($test_token, $empty_allowed, $cleanup) = @_;
 
 Now use $test_token instead of $_
 
  
 
 I am writing a script to rename files. The script prompts the user to enter 
 tokens which will comprise part of the file name. These are made of letters 
 and numbers separated by a dot, ie: R.3. The letters can be R, I or C, upper 
 or lowercase. The first number can be one through eight, and if it is a six 
 there can be an additional dot followed by a one or two. To make sure that I 
 don't make any mistakes when putting in these tokens, I have tried to write a 
 function to make sure that they are valid:
 
 sub validate_tokens {
 if ($_ !~ /^[ric]\.[1234578]$/i) {
 if ($_ !~ /^[ric]\.6(\.[1|2])?$/i) {
 print Enter valid tokens: ;
 my $new_token = STDIN;
 chomp $new_token;
 validate_tokens($new_token);
 }
 }
 return $_;
 }
 
 
 I'm calling it like this:
 
 print Enter Tokens: ;
 my $token = STDIN;
 chomp $token;
 my $valid_token = validate_tokens($token);
 
 
 Too bad it doesn't work! Even if I put in valid tokens on the first shot, 
 there are errors:
 
 Use of uninitialized value $_ in pattern match (m//) at bulk.pl line 65, 
 STDIN line 3.
 Use of uninitialized value $_ in pattern match (m//) at bulk.pl line 66, 
 STDIN line 3.
 
 Lines 65 and 66 are the ones with the two if statements, lines 2 and 3 of the 
 first code snippet above. Each time I run it, the STDIN line number in the 
 error message increases by one. The script goes into an infinite loop, 
 prompting for valid tokens, but even if valid ones are entered it continues. 
 What am I missing?
 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/
 
 
 
 
 -- 
 
 a
 
 Andy Bach,
 afb...@gmail.com
 608 658-1890 cell
 608 261-5738 wk
 
 
 
 From: Jim Gibson jimsgib...@gmail.com
 Date: September 3, 2013 12:56:16 AM CDT
 To: begin  Perl Beginners beginners@perl.org
 Subject: Re: Recursive Validation Function
 
 
 
 On Sep 2, 2013, at 6:23 PM, John Aten wrote:
 
 Hello all,
 
 I am writing a script to rename files. The script prompts the user to enter 
 tokens which will comprise part of the file name. These are made of letters 
 and numbers separated by a dot, ie: R.3. The letters can be R, I or C, upper 
 or lowercase. The first number can be one through eight, and if it is a six 
 there can be an additional dot followed by a one or two. To make sure that I 
 don't make any mistakes when putting in these tokens, I have tried to write 
 a function to make sure that they are valid:
 
 Use $_[0] instead of $_ or the other suggestions from Andy.
 
 
 sub validate_tokens {
  if ($_ !~ /^[ric]\.[1234578]$/i) {
  if ($_ !~ /^[ric]\.6(\.[1|2])?$/i) {
 
 The '[1|2]' part of this regular expression is incorrect. Characters in a 
 character class do not need the pipe character for alternation. They are 
 already treated as alternates. This part should be '[12]'. Your pattern will 
 match the string 'r.6|'
 
 You can combine the two regular expressions by using alternation and 
 non-capturing grouping (clustering):
 
 if( $_[0] !~ /^[ric]\.(?:[1-578])|(?:6(\.[12])?)$/i ) {