Re: File Size Limit
Hi James, please reply to all recipients. On Fri, 8 Apr 2016 10:47:57 +0100 James Kerwin <jkerwin2...@gmail.com> wrote: > Good morning/afternoon all (depending on where you are), > > This should be a quick one: > > When creating files in a perl script is there a way to limit the size of > the file created? If somebody could give me a term to search for that > should be enough. > > I have googled this but I can't find an answer. I'm probably not using the > right search terms because this seems very do-able and I'm surprised to > have not found anything. > > My situation is as follows: > > I perform some text manipulation on files that are 30 MB in size. > The newly formatted files get pushed to another script that can only handle > files of 5MB maximum. > > So I would like to be able to limit the file size and start a new one when > it reaches (or comes close to) this limit. This would allow me to automate > it rather than having to manually break the big files up before continuing. > You can tell the current position in the file using http://perldoc.perl.org/functions/tell.html and write some logic to handle it. There's also http://perldoc.perl.org/functions/truncate.html . One option would be to use a custom file handle (see perldoc perltie) but that may be much slower than implementing it in a higher-level. Hope it helps. Regards, Shlomi > Thanks! > James. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
File Size Limit
Good morning/afternoon all (depending on where you are), This should be a quick one: When creating files in a perl script is there a way to limit the size of the file created? If somebody could give me a term to search for that should be enough. I have googled this but I can't find an answer. I'm probably not using the right search terms because this seems very do-able and I'm surprised to have not found anything. My situation is as follows: I perform some text manipulation on files that are 30 MB in size. The newly formatted files get pushed to another script that can only handle files of 5MB maximum. So I would like to be able to limit the file size and start a new one when it reaches (or comes close to) this limit. This would allow me to automate it rather than having to manually break the big files up before continuing. Thanks! James.
Re: Gigantic file size processing error
In article 1388676082.98276.yahoomail...@web193403.mail.sg3.yahoo.com, mani_nm...@yahoo.com (mani kandan) wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. regards Manikandan Hi, have you try this kind of command : perl -p -i -e s/oneThing/otherThing/g yourFile hang or not ? and, 500MB is not a gigantic file :) -- klp -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
Hi List, On Friday, January 03, 2014 10:57:13 AM kurtz le pirate wrote: have you try this kind of command : perl -p -i -e s/oneThing/otherThing/g yourFile I was about to post the same thing. My suggestion: Create a backup file just in case something goes wrong. perl -pi.bak -e s/oneThing/otherThing/g yourFile This creates a backup named yourFile.bak prior to processing yourFile. hang or not ? I have processed files 2G this way, no problems encountered. Regards, Jan -- When a woman marries again it is because she detested her first husband. When a man marries again, it is because he adored his first wife. -- Oscar Wilde - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
Am 02.01.2014 18:08, schrieb David Precious: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) A short look to CPAN brings out https://metacpan.org/pod/File::Inplace what looks to do what OP wants. Honestly I never used, and it can be that it has also a performance problem, but for at least I looked to it's source code and it implements it via a temporary file without saving the whole file. Greetings, Janek -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
Hi, Thanks for all your guidance, The Error was Perl Command Line Intepretar has encountered a problem and needs to close, Also increased the virtual memory, No use, My system configuration OS XP SP3 Intel Core 2 duo with 2 GB Ram. regards Manikandan N On Friday, 3 January 2014 9:06 PM, Janek Schleicher janek_schleic...@yahoo.de wrote: Am 02.01.2014 18:08, schrieb David Precious: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) A short look to CPAN brings out https://metacpan.org/pod/File::Inplace what looks to do what OP wants. Honestly I never used, and it can be that it has also a performance problem, but for at least I looked to it's source code and it implements it via a temporary file without saving the whole file. Greetings, Janek -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/03/2014 10:22 AM, Janek Schleicher wrote: A short look to CPAN brings out https://metacpan.org/pod/File::Inplace what looks to do what OP wants. Honestly I never used, and it can be that it has also a performance problem, but for at least I looked to it's source code and it implements it via a temporary file without saving the whole file. i haven't seen that before but it was last touched in 2005. its api requires method calls to get each line, another method call to replace a line and such. i would call that somewhat clunky compared to edit_file_lines and its primary arg of a code block modifies $_. likely it will be much slower for typical files as well. now for very large files, we can't tell. we still haven't heard back from the OP about the actual error. my conjecture of a resource limit still feels right. neither perl nor file::slurp would have any errors on a large file other than limited resources. and that can be fixed with a ulimit call or similar. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/03/2014 12:10 PM, mani kandan wrote: Hi, Thanks for all your guidance, The Error was Perl Command Line Intepretar has encountered a problem and needs to close, that isn't the real error. you need to run this in a command window that won't close after it fails so you can see the real error message. Also increased the virtual memory, No use, My system configuration OS XP SP3 Intel Core 2 duo with 2 GB Ram. that isn't a lot of ram for a 500MB file to be slurped. increasing the virtual ram won't help as it will likely be mostly in swap. i don't know windows much so i can't say how to really check/set the virtual size of a process. try doing this on linux or on a box with much more ram. otherwise use a perl -p one liner loop and it should work. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Fri, 03 Jan 2014 12:22:48 -0500 Uri Guttman u...@stemsystems.com wrote: i haven't seen that before but it was last touched in 2005. That means it has no bugs. A better metric of a modules quality is how many outstanding bugs are? See https://rt.cpan.org//Dist/Display.html?Queue=File-Inplace -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/03/2014 12:48 PM, Shawn H Corey wrote: On Fri, 03 Jan 2014 12:22:48 -0500 Uri Guttman u...@stemsystems.com wrote: i haven't seen that before but it was last touched in 2005. That means it has no bugs. A better metric of a modules quality is how many outstanding bugs are? See https://rt.cpan.org//Dist/Display.html?Queue=File-Inplace it also means it may be rotting on the vine. or no one uses it to report bugs. or it is an orphan module. or no requests for new features (popular modules always get that). stable doesn't always mean it is good. considering i wrote edit_file_lines and never heard of that until now. it says the module isn't known or used. in fact metacpan says it has no reverse dependencies (not one module or distribution uses it). not bragging but file::slurp has over 600 reverse dependencies. that means i get feature requests, more bug reports, etc. you may need to look at the whole picture before you decide to use a module. if this module was so useful, why isn't it being used by anyone since 2005? i think a major negative is the very odd api which i already mentioned. you have to do a lot of work to use it and it doesn't gain much because of that. it does have a commit/rollback thing but again, that is easy to code up yourself. just write to a temp file and either rename it or delete it. not much of a win there. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 02/01/2014 15:21, mani kandan wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Slurping entire files into memory is usually overkill, and you should only do it if you can aford the memory and *really need* random access to the entire file at once. Most of the time a simple sequential read/modify/write is appropriate, and Perl will take care of buffering the input and output files in reasonable amounts. According to your later posts you have just 2GB of memory, and although Windows XP *can* run in 500MB I wouldn't like to see a program that slurped a quarter of the entire memory. I haven't seen you describe what processing you want to do on the file. If the input is a text file and the changes can be done line by line, then you are much better off with a program that looks like this use strict; use warnings; open my $in, '', 'myfile.txt' or die $!; open my $out, '', 'outfile.txt' or die $!; while ($in) { s/from string/to string/g; print $out $_; } __END__ But if you need more, then I would guess that Tie::File is your best bet. You don't say what problems you are getting using this module, so please explain. Rob --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/03/2014 02:28 PM, Rob Dixon wrote: On 02/01/2014 15:21, mani kandan wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Slurping entire files into memory is usually overkill, and you should only do it if you can aford the memory and *really need* random access to the entire file at once. Most of the time a simple sequential read/modify/write is appropriate, and Perl will take care of buffering the input and output files in reasonable amounts. of course i differ on that opinion. slurping is almost always faster and in many cases the code is simpler than line by line i/o. also you can do much easier parsing and processing of whole files in single scalar than line by line. and reasonable size has shifted dramatically over the decades. in the olden days line by line was mandated due to small amounts of ram. the typical file size (code, configs, text, markup, html, etc) has not grown much since then but ram has gotten so large and cheap. slurping is the way to go today other than for genetics, logs and similar super large files. According to your later posts you have just 2GB of memory, and although Windows XP *can* run in 500MB I wouldn't like to see a program that slurped a quarter of the entire memory. I haven't seen you describe what processing you want to do on the file. If the input is a text file and the changes can be done line by line, then you are much better off with a program that looks like this use strict; use warnings; open my $in, '', 'myfile.txt' or die $!; open my $out, '', 'outfile.txt' or die $!; while ($in) { s/from string/to string/g; print $out $_; } __END__ But if you need more, then I would guess that Tie::File is your best bet. You don't say what problems you are getting using this module, so please explain. tie::file will be horrible for editing a large file like that. your line by line or similar code would be much better. tie::file does so much seeking and i/o, much more than linear access buffering would do. when lines wrap over block boundaries (much more likely than not), tie::file does extra amounts of i/o. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Gigantic file size processing error
Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. regards Manikandan
Re: Gigantic file size processing error
On Thu, 2 Jan 2014 23:21:22 +0800 (SGT) mani kandan mani_nm...@yahoo.com wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Firstly, be specific - come out with error doesn't help us - what is the error? Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. Cheers Dave P -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 10:39 AM, David Precious wrote: On Thu, 2 Jan 2014 23:21:22 +0800 (SGT) mani kandan mani_nm...@yahoo.com wrote: Hi, We have file size of huge size 500MB, Need to Manipulate the file, some replacement and then write the file, I have used File::slurp and works for file size of 300MB (Thanks Uri) but for this huge size 500MB it is not processing and come out with error. I have also used Tie::file module same case as not processing, any guidance. Firstly, be specific - come out with error doesn't help us - what is the error? Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. slurping in 500MB is not a smart thing unless you have many gigs of free ram. otherwise it will just be going to disk on the swap and you don't gain much other than simpler logic. but i agree, knowing the error message and who is generating it will be valuable. it could be a virtual ram limitation on the OS which can be changed with the ulimit utility (or BSD::Resource if you have that module). uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 11:18:31 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 10:39 AM, David Precious wrote: Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. Oh - my mistake, I'd believed that edit_file_lines edited the file line-by-line, writing the results to a temporary file and then renaming the temporary file over the original at the end. In that case, I think the docs are a little unclear: These subs read in a file into $_, execute a code block which should modify $_ and then write $_ back to the file. The difference between them is that edit_file reads the whole file into $_ and calls the code block one time. With edit_file_lines each line is read into $_ and the code is called for each line... and These subs are the equivalent of the -pi command line options of Perl... ... to me, that sounds like edit_file_lines reads a line at a time rather than slurping the whole lot - but looking at the code, it does indeed read the entire file contents into RAM. (I probably should have expected anything in File::Slurp to, well, slurp the file... :) ) Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) OP - what didn't work about Tie::File? -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 11:48 AM, David Precious wrote: On Thu, 02 Jan 2014 11:18:31 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 10:39 AM, David Precious wrote: Secondly - do you need to work on the file as a whole, or can you just loop over it, making changes, and writing them back out? In other words, do you *need* to hold the whole file in memory at one time? More often than not, you don't. If it's per-line changes, then File::Slurp::edit_file_lines should work - for e.g.: use File::Slurp qw(edit_file_lines); my $filename = '/tmp/foo'; edit_file_lines(sub { s/badger/mushroom/g }, $filename); The above would of course replace every occurrence of 'badger' with 'mushroom' in the file. if there is a size issue, that would be just as bad as slurping in the whole file and it would use even more storage as it will be an array of all the lines internally. Oh - my mistake, I'd believed that edit_file_lines edited the file line-by-line, writing the results to a temporary file and then renaming the temporary file over the original at the end. In that case, I think the docs are a little unclear: These subs read in a file into $_, execute a code block which should modify $_ and then write $_ back to the file. The difference between them is that edit_file reads the whole file into $_ and calls the code block one time. With edit_file_lines each line is read into $_ and the code is called for each line... good point. i should emphasize that it does slurp in the file. tie::file only reads in chunks and moves around as you access elements. edit_file_lines slurps into an array and loops over those elements aliasing each one to $_. it definitely eats its own dog food! and These subs are the equivalent of the -pi command line options of Perl... ... to me, that sounds like edit_file_lines reads a line at a time rather than slurping the whole lot - but looking at the code, it does indeed read the entire file contents into RAM. (I probably should have expected anything in File::Slurp to, well, slurp the file... :) ) as i said, dog food is good! :) i wrote edit_file and edit_file_lines as interesting wrappers around read_file and write_file. i assumed it was obvious they used those slurp functions. Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 11:56:26 -0500 Uri Guttman u...@stemsystems.com wrote: Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 12:08 PM, David Precious wrote: On Thu, 02 Jan 2014 11:56:26 -0500 Uri Guttman u...@stemsystems.com wrote: Part of me wonders if File::Slurp should provide an in-place (not slurping into RAM) editing feature which works like edit_file_lines but line-by-line using a temp file, but that's probably feature creep :) that IS tie::file which i didn't want for efficiency reasons. it has to read/write back and forth every time you modify an element. edit_file (and _lines) are meant to be fast and simple to use for common editing of files. as with slurping, i didn't expect them to be used on .5GB files! :) Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file Of course, it's pretty easy to write such code yourself, and as it doesn't slurp the file in, it could be considered out of place in File::Slurp. I'd be fairly surprised if such a thing doesn't already exist on CPAN, too. (If it didn't, I might actually write such a thing, as a beginner-friendly here's how to easily modify a file, line by line, with minimal effort offering.) it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) thanx, uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On 01/02/2014 12:33 PM, David Precious wrote: On Thu, 02 Jan 2014 12:19:16 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 12:08 PM, David Precious wrote: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file [...] it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) Righto - I'll add it to my list of things awaiting tuit resupply :) who is your tuit supplier? i am looking for a better and cheaper one. uri -- Uri Guttman - The Perl Hunter The Best Perl Jobs, The Best Perl Hackers http://PerlHunter.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Gigantic file size processing error
On Thu, 02 Jan 2014 12:19:16 -0500 Uri Guttman u...@stemsystems.com wrote: On 01/02/2014 12:08 PM, David Precious wrote: Oh, I was thinking of a wrapper that would: (a) open a new temp file (b) iterate over the source file, line-by-line, calling the provided coderef for each line (c) write $_ (potentially modified by the coderef) to the temp file (d) finally, rename the temp file over the source file [...] it wouldn't be a bad addition to file::slurp. call it something like edit_file_loop. if you write it, i will add it to the module. you can likely steal the code from edit_file_lines and modify that. i would document it as an alternative to edit_file_lines for very large files. it will need pod, test files and good comments for me to add it. credit will be given :) Righto - I'll add it to my list of things awaiting tuit resupply :) -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help - Working Version
Hi folks, happy new year to everyone. ) John, you're right, of course. ) The filenames in nested directories could well overlap, and using $File::Find::name would be safer. Didn't think of that as a big problem, though, as original script (with 'opendir') ignored all the nested folders overall. Jonathan, no, you don't have to store the filenames as array: complete pathnames of the file can't be repeated. It'll be sufficient just to change this line: $filedata{$_} = [$filesize, $filemd5] for $filedata{$File::Find::name} = [$filesize, $filemd5] (and replace catfile in the writing block as well, as %filedata keys will now be the whole filenames themselves). On sorting: cmp and = is not the same: former compares strings, latter - numbers. So, for example, 'abc' cmp 'def' gives you -1, but 'abc' = 'def' gives 0 (and warnings about non-numeric args as well). It's nice to know the difference, but... do you really need to sort the output in your script? What output? ) It makes no difference in what order your .md5 files will be created, right? And you don't need to print the list of files processed (as I did in my test script, that's why the ordering was ever mentioned). As for $_, the problem John mentioned is logical, not 'perlical': as $_ variable is assigned a filename processed within File::Find target sub, and files in different directories could have the same names (but not full names, with absolute path attached), it may cause a bit of confusion when they DO have the same names. ) Generally speaking, $_ usage is for comfort and speed (yes, thinking of it as of 'it' word is right )). Of course, you can reassign it, but it'll make your scripts bigger (sometimes _much_ bigger) without adding much clarity, in my opinion. But that, again, is a matter of taste. For me, I use $_ almost every time I process shallow collections (hashes or arrays, doesn't matter). When two-level (or more complex) data structure is processed, it's usually required to use a temporary variable - but even then inner layers can be iterated with $_ easily. -- iD
Re: File Size Script Help - Working Version
Hello: On Sat, Dec 31, 2011 at 02:56:50AM +0200, Igor Dovgiy wrote: $filedata{$_} = [$filesize, $filemd5]; *snip* my ($size, $md5) = @{ $filedata{$filename} }; Alternatively, store a nested hash-reference: $filedata{$File::Find::name} = { md5 = $file_md5, size = $file_size, }; # ... my ($size, $md5) = @{$filedata{$filename}}{qw/size md5/}; That way you don't need to remember which [arbitrary] order they're in, and if the order changes, or more fields are added, the meaning of subsequent code doesn't change. Regards, -- Brandon McCaig bamcc...@gmail.com bamcc...@castopulence.org Castopulence Software https://www.castopulence.org/ Blog http://www.bamccaig.com/ perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say' signature.asc Description: Digital signature
Re: File Size Script Help - Working Version
On Sat, Dec 31, 2011 at 4:29 AM, John W. Krahn jwkr...@shaw.ca wrote: Igor Dovgiy wrote: Great work, Jonathan! Notice how simple your script has become - and that's a good sign as well in Perl. :) We can make it even simpler, however. As you probably know, Perl has two fundamental types of collections: arrays (where data is stored as a sequence of elements, data chunks) and hashes (where data chunks are unordered, but stored with some unique key used to retrieve it). Sometimes hashes are used just to sort out (non-)unique data, but that's another story. Now look at this line: push @{$files{$filesize}}, $File::Find::name; Don't you see something... weird? You're using hash where filesizes are the keys - and because, yes, they may well be non-unique, you have to store arrays of filenames in your hash instead... But much more natural (at least, for me) is to organize your hash (let's call it %filedata) so that filenames (which are unique by their nature) become the keys. And some info about these files - sizes and md5-hashes - become the values. Yes, file names in a given directory _have_ to be unique, however... For example, our `wanted` (btw, its name is misleading a bit, no? may be 'process' will sound better?) sub may look as follows: find(\wanted, $path); my %filedata; sub wanted { return if substr($_, 0, 1) eq '.' || -d $_; my $filesize = -s _; open my $fh, '', $_ or die $!, $/; my $filemd5 = Digest::MD5-new-addfile($fh)**-hexdigest; close $fh; $filedata{$_} = [$filesize, $filemd5]; You are traversing a directory tree, so using $_ as the key may cause collisions across different directories. Better to use $File::Find::name which contains the full absolute path name. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi to all on the list still following this thread - and Happy New Year! Igor...Thanks!! : ) It does feel like there has been some really good Perl learning progress being made here - and yep, I cannot believe how trimmed down the script has now become. Looking back on the original script makes me laugh! I wonder if that will become a consistent theme when writing?! Looking back to the hash - I agree that it makes far more sense to have the filenames as the keys Quoting yourself and John: filenames (which are unique by their nature) Yes, file names in a given directory _have_ to be unique. I think that we can all be in agreement then that these entries should be guaranteed to have unique keys and can have non-unique data such as file size attributed to them- therefore: push @{$files{$filename}}, $File::Find::name; When sorting the hash, there seems to well established code for this eg: sorting by file size: foreach (sort {$filedata{$b} = $filedata{$a}} keys %filedata { ## should sort so that the highest value file size is first ... } As far as I'm aware, = and cmp are the same thing Is there a question of precedence over them? I assume that = has a higher precedence Interestingly, this is now the second time in this thread that we have been warned against using $_ From John: $_ as the key may cause collisions across different directories From Shlomi: The $_ variable can be easily devastated. You should use a lexical one. I believe that I understand the use of $_ as the default variable; indeed, the documentation on CPAN about File::Find states the usage of $_ in the module However, it seems that it is a variable that it so easily destroyed and many have warned against using it If this is the case, why would we choose (or be required) to use it in the first place? I have read the 'Elements to Avoid' page, as recommended by Shlomi http://perl-begin.org/tutorials/bad-elements/ which is very useful Would it be correct to say that $_ should be re-assigned asap whenever using Perl? I couldn't find any exceptions that state that it is ok to use it # Sincere thanks again to you all for your contributions I hope that others reading this list are learning as much as I am! All the best Jonathan
Re: File Size Script Help - Working Version
Hi Jonathan, Argh, really stupid mistake by me. ) But let's use it to explain some points a bit further, shall we? A skilled craftsman knows his tools well, and Perl programmer (with CPAN as THE collection of tools of all sizes and meanings) has an advantage here: even if documentation is a bit vague about what's going on, we are (usually) able to check the code itself to find the answers. ) By browsing File::Spec source (found via 'Source' link within the 'File::Spec' page at CPAN)... http://cpansearch.perl.org/src/SMUELLER/PathTools-3.33/lib/File/Spec.pm ...we soon discover that this module is essentially an adapter for modules like File::Spec::Unix, File::Spec::Mac, File::Spec::Win32 etc. So our search goes on (as your mention of .DS_Store file implies) over there: http://cpansearch.perl.org/src/SMUELLER/PathTools-3.33/lib/File/Spec/Mac.pm Now we may either check the documentation (which clearly states that only the last argument to catfile is considered a filename, and all the others will be concatenated with catdir), or look right into the code - and come to the same conclusion: sub catfile { my $self = shift; return '' unless @_; my $file = pop @_; return $file unless @_; my $dir = $self-catdir(@_); $file =~ s/^://s; return $dir.$file; } So what should we do now? ) Of course, give milk to our cat... and arguments to File::Spec's catfile! ) Like this: File::Spec-catfile($path, $dircontents . '.md5') ... or this... File::Spec-catfile($path, $dircontents.md5) (check 'variable interpolation in Perl' to see why it's possible - and why this is essentially the same as previous codeline) ... or even this ... File::Spec-catfile($path, join '.', $dircontents, 'md5') (but that would be a bit overkill, of course :) Speaking of overkills: you used regex (=~ /^\./) to check whether the line begins with a dot - or not. ) It's ok for this task, but you probably should know that these checks may be also done with (substr($line, 0, 1) eq '.') code, which will be a bit (up to 30% at my PC when Benchmark'ed) faster. -- iD 2011/12/30 Jonathan Harris jtnhar...@googlemail.com I tried to use your suggestion open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ but it returned an error on the command line: 'Not a directory' At which point the program dies (which is what it is supposed to do!) I used it inside the loop - sorry to bug you for clarification if ($dircontents=~/^\./ || -d $dircontents) { next; } This is also to avoid the file .DS_Store
Re: File Size Script Help - Working Version
Hi John, yes, good point! Totally forgot this. ) Adding new files to a directory as you browse it is just not right, of course. Possible, but not right. ) I'd solve this by using hash with filenames as keys and collected 'result' strings (with md5 and filesizes) as values, filled by File::Find target routine. After the whole directory is processed, this hash should be 'written out' into the target directory. Another way to do it is to collect all the filenames instead into a list (with glob operator, for example), and process this list after. BTW (to Jonathan), I wonder do you really need to store this kind of data in different files? No offence... but I can hardly imagine how this data will be used later unless gathered into some array or hash. ) -- iD 2011/12/30 John W. Krahn jwkr...@shaw.ca Jonathan Harris wrote: Hi John Thanks for your 2 cents I hadn't considered that the module wouldn't be portable That is not what I was implying. I was saying that when you add new files to a directory that you are traversing you _may_ get irregular results. It depends on how your operating system updates directory entries.
Re: File Size Script Help - Working Version
On Fri, Dec 30, 2011 at 11:58 AM, Igor Dovgiy ivd.pri...@gmail.com wrote: Hi John, yes, good point! Totally forgot this. ) Adding new files to a directory as you browse it is just not right, of course. Possible, but not right. ) I'd solve this by using hash with filenames as keys and collected 'result' strings (with md5 and filesizes) as values, filled by File::Find target routine. After the whole directory is processed, this hash should be 'written out' into the target directory. Another way to do it is to collect all the filenames instead into a list (with glob operator, for example), and process this list after. BTW (to Jonathan), I wonder do you really need to store this kind of data in different files? No offence... but I can hardly imagine how this data will be used later unless gathered into some array or hash. ) -- iD 2011/12/30 John W. Krahn jwkr...@shaw.ca Jonathan Harris wrote: Hi John Thanks for your 2 cents I hadn't considered that the module wouldn't be portable That is not what I was implying. I was saying that when you add new files to a directory that you are traversing you _may_ get irregular results. It depends on how your operating system updates directory entries. Hi All John - Thanks for the clarification In this instance the script has been run on OSX - it seems that adding the files into the directory that is being traversed works ok this time However for best practice, I would certainly look into writing to a separate directory, and then moving the files back, as I appreciate that this fortune may not necessarily be repeated in a different environment! Igor - Firstly - File::Spec Thanks for your insight and well explained investigation - I have been learning a lot from this File::Spec has proven a most useful tool in joining and 'stringifying' the paths In the original post about this script, I had spoken about considering using a hash for the file data I'm still convinced that ultimately, this would be the way forwards I have found some scripts online concerning finding duplicate files They use md5 and/or file sizes to compare the files These are written into hashes Fully understanding some of these scripts is a little beyond my level at the moment I have attached an interesting one for you to look at (you may be aware of it already!) However, it has proved quite inspiring! (substr($line, 0, 1) eq '.') Haven't learned this yet! It looks like a good solution if it is so much more efficient - thanks for the introduction - I'll be reading up asap! BTW (to Jonathan), I wonder do you really need to store this kind of data in different files? No offence... but I can hardly imagine how this data will be used later unless gathered into some array or hash. ) There is a good reason for this! Talking to guys who work in video on demand, it seems that it is standard practice to do this for file delivery requirements As each video file must be identical upon receipt as it was upon delivery ( and that the files are all treated as unique delivery instances ) a separate accompanying file is required I thought that Perl would be a good choice for accomplishing this requirement as it is renowned for file handling # Thanks to everyone for your help and contributions - particularly Jim, Shlomi, John and Igor I have learned crazy amounts already! Happy New Year to you all! Jonathan finddupes3.plx Description: Binary data -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help - Working Version
On Thu, Dec 29, 2011 at 03:43:19PM +, Jonathan Harris wrote: Hi All Hello Jonathan: (Disclaimer: I stayed up all night playing Skyrim and am running on about 4.5 hours of sleep.. ^_^) I think most things have already been addressed, but I think Igor might have had a bit of trouble making it clear. opendir (my $in, $path) or die Cannot open $dir: $!\n; find (\wanted, $path); close $in; opendir (my $totalin, $path) or die Cannot open $dir: $!\n; find (\cleanup, $path); close $totalin; AFAICT, it's completely nonsensical to open a directory file handle surrounding File::Find::find. I tried to /search the perldoc (just in case there's some kind of magical optimization or something) and saw no mention of 'opendir' or 'handle' (except for the special _ file handle created for stat, lstat, etc..). So it seems $in and $totalin are completely unnecessary here: File::Find will worry about opening and processing the directories for you. sub wanted { while ($dircontents = readdir($in)) { I guess this is why you are opening directory handles above, but it doesn't really make sense. You're basically only using File::Find to loop at this point, and very obscurely. :) File::Find's role in life is precisely to find you all the files within a directory tree. You're reinventing the square wheel with your use of opendir and readdir. :) The wanted subroutine is typically used to either process the file system tree outright, or store applicable files in data structures for later processing. E.g., use strict; use warnings; use File::Find; my @files; sub wanted { # Skip dot files and directories. return if substr($_, 0, 1) eq '.' || -d $_; # If current file is a normal file, push into array for # later. push @files, $File::Find::name if -f $_; } my $path = '.'; find \wanted, $path; # Now @files should be filled with a recursive list of files to # process. E.g., for my $file (@files) { my $md5name = $file.'md5'; # Etc... } my $hex = Digest::MD5-new-addfile($fh)-hex digest; I assume you meant `hexdigest' here, not 'hex digest'. $newname =~ s/\ //; Ideally if you're going to do something as obscure as this, you should comment it in both places so future readers and maintainers understand why it's done, even if they only read one half of the program. I think Igor else has already explained how to eliminate this obscurity though. :) Regards, -- Brandon McCaig bamcc...@gmail.com bamcc...@castopulence.org Castopulence Software https://www.castopulence.org/ Blog http://www.bamccaig.com/ perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say' signature.asc Description: Digital signature
Re: File Size Script Help - Working Version
On Fri, Dec 30, 2011 at 7:11 PM, Brandon McCaig bamcc...@gmail.com wrote: On Thu, Dec 29, 2011 at 03:43:19PM +, Jonathan Harris wrote: Hi All Hello Jonathan: (Disclaimer: I stayed up all night playing Skyrim and am running on about 4.5 hours of sleep.. ^_^) I think most things have already been addressed, but I think Igor might have had a bit of trouble making it clear. opendir (my $in, $path) or die Cannot open $dir: $!\n; find (\wanted, $path); close $in; opendir (my $totalin, $path) or die Cannot open $dir: $!\n; find (\cleanup, $path); close $totalin; AFAICT, it's completely nonsensical to open a directory file handle surrounding File::Find::find. I tried to /search the perldoc (just in case there's some kind of magical optimization or something) and saw no mention of 'opendir' or 'handle' (except for the special _ file handle created for stat, lstat, etc..). So it seems $in and $totalin are completely unnecessary here: File::Find will worry about opening and processing the directories for you. sub wanted { while ($dircontents = readdir($in)) { I guess this is why you are opening directory handles above, but it doesn't really make sense. You're basically only using File::Find to loop at this point, and very obscurely. :) File::Find's role in life is precisely to find you all the files within a directory tree. You're reinventing the square wheel with your use of opendir and readdir. :) The wanted subroutine is typically used to either process the file system tree outright, or store applicable files in data structures for later processing. E.g., use strict; use warnings; use File::Find; my @files; sub wanted { # Skip dot files and directories. return if substr($_, 0, 1) eq '.' || -d $_; # If current file is a normal file, push into array for # later. push @files, $File::Find::name if -f $_; } my $path = '.'; find \wanted, $path; # Now @files should be filled with a recursive list of files to # process. E.g., for my $file (@files) { my $md5name = $file.'md5'; # Etc... } my $hex = Digest::MD5-new-addfile($fh)-hex digest; I assume you meant `hexdigest' here, not 'hex digest'. $newname =~ s/\ //; Ideally if you're going to do something as obscure as this, you should comment it in both places so future readers and maintainers understand why it's done, even if they only read one half of the program. I think Igor else has already explained how to eliminate this obscurity though. :) Regards, -- Brandon McCaig bamcc...@gmail.com bamcc...@castopulence.org Castopulence Software https://www.castopulence.org/ Blog http://www.bamccaig.com/ perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say' -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAEBAgAGBQJO/gzlAAoJEN2n1gIi5ZPyMJoP/15LI4aelHowMiNQYFzLB2E1 nEAjPHvvHX7uTxLV8BOht9HtoQPpOwyQ/fJm2EIe59NGwjjvlKCdpCrfkfxz3cvV td0wDuUKrjzQPaACl4cNrGITZLVXe6KtZUKG2o3TjA5dlyqbes5d5F40Mh3j/fB5 L0VZpWNvp0cTrJ6x5QfTkvyQPdxKx0ARaFDQYpvR3uKfgeD28ZNatNCQQuuymLkj ABVqLzQgVVMinaj/4xXii5vedgYFI58DPWF7r0nmhUaiVvDAEFzfd1MlSvkuI8Jr EKTQdz3EjMZufRaGxX96rdZvVMEiSTcA/IXNkhis48dOwa4ebfSYm8QpQhp1E6UF QuJBRXzF9cHvD095Aw+MSqoSR+2LZS3SFBfdB9I5rdxJEoS7LMSJdJLY95dXftqR KdlcU7ds4kdqaJrnxrxB05SIhgZNq1JcCk9xZyuk7NxsPwl/6ZStr/E/V2mA+bdD bGWCuP7IKRfZ8cNd01tEYnUppYkfxRaNtYhlNmPVh6TX7rd+2Z4aZLn0anR3Q/J4 2OsPmtIOakk1jW6f41vTOqZ5cpxSv/R1H6fjZAOVACf7UlrtCgVWVLKYofcO/ryS HMsX1T3m3h9H3heSd76FiXMsgDBaMBkti31FzwnS3pOaZEpHk6eHF/dbR2p0sAbA gTvG4xDp2MLEtY03Ofhw =RFiR -END PGP SIGNATURE- HI Brandon Thanks for your response I totally agree with your first point Having now used File::Find a little more, I have seen that using opendir was totally unneccessary and have removed them from the script And guess what.it works fine without them! I think that my initial confusion arose from fundamentally misunderstanding File::Find - thinking that it required a handle, not just a path. I have also now exchanged the while loop with a foreach loop - much better! I assume you meant `hexdigest' here, not 'hex digest'. You assume correctly! Gmail has started to do annoying auto text complete - I must turn it off!! push @files, $File::Find::name if -f $_; This is nice and clean Your approach is different to what we have been discussing You seem to gather the files with File::Find and then leave that sub alone asap The processing is then done in the results of that gathering My script left the processing within sub wanted This could possible be a reason that complications arose so quickly To get file names and sizes at the same time, I am also considering my %files; sub wanted { my $filesize = (stat($_))[7]; push @{$files{$filesize}}, $File::Find::name; } find(\wanted, $path); to hash files and file size results together - then process after And yep, Igor
Re: File Size Script Help - Working Version
Great work, Jonathan! Notice how simple your script has become - and that's a good sign as well in Perl. :) We can make it even simpler, however. As you probably know, Perl has two fundamental types of collections: arrays (where data is stored as a sequence of elements, data chunks) and hashes (where data chunks are unordered, but stored with some unique key used to retrieve it). Sometimes hashes are used just to sort out (non-)unique data, but that's another story. Now look at this line: push @{$files{$filesize}}, $File::Find::name; Don't you see something... weird? You're using hash where filesizes are the keys - and because, yes, they may well be non-unique, you have to store arrays of filenames in your hash instead... But much more natural (at least, for me) is to organize your hash (let's call it %filedata) so that filenames (which are unique by their nature) become the keys. And some info about these files - sizes and md5-hashes - become the values. For example, our `wanted` (btw, its name is misleading a bit, no? may be 'process' will sound better?) sub may look as follows: find(\wanted, $path); my %filedata; sub wanted { return if substr($_, 0, 1) eq '.' || -d $_; my $filesize = -s _; open my $fh, '', $_ or die $!, $/; my $filemd5 = Digest::MD5-new-addfile($fh)-hexdigest; close $fh; $filedata{$_} = [$filesize, $filemd5]; } (*Notice how you don't have to declare the global filedata hash before the callback function is called in `find`? It's really interesting topic*) Then you'll just have to iterate over the %filedata - and it's as easy as writing... for my $filename (keys %filedata) { my ($size, $md5) = @{ $filedata{$filename} }; open my $fh, '', File::Spec-catfile($path, $filename.md5) or die $!, $/; print $fh $filename\t$size bytes\t$md5\n; close $fh; } ... yep, that easy. ) -- iD P.S. Ordering the output should be an easy task for you; hint - look up 'sort' documentation - or just use sort system routine. :) 2011/12/31 Jonathan Harris jtnhar...@googlemail.com HI Brandon Thanks for your response I totally agree with your first point Having now used File::Find a little more, I have seen that using opendir was totally unneccessary and have removed them from the script And guess what.it works fine without them! I think that my initial confusion arose from fundamentally misunderstanding File::Find - thinking that it required a handle, not just a path. I have also now exchanged the while loop with a foreach loop - much better! I assume you meant `hexdigest' here, not 'hex digest'. You assume correctly! Gmail has started to do annoying auto text complete - I must turn it off!! push @files, $File::Find::name if -f $_; This is nice and clean Your approach is different to what we have been discussing You seem to gather the files with File::Find and then leave that sub alone asap The processing is then done in the results of that gathering My script left the processing within sub wanted This could possible be a reason that complications arose so quickly To get file names and sizes at the same time, I am also considering my %files; sub wanted { my $filesize = (stat($_))[7]; push @{$files{$filesize}}, $File::Find::name; } find(\wanted, $path); to hash files and file size results together - then process after And yep, Igor has been thorough and very helpful Thanks again for your input on this - hope you manage to get some sleep! All the best Jonathan
Re: File Size Script Help - Working Version
Igor Dovgiy wrote: Great work, Jonathan! Notice how simple your script has become - and that's a good sign as well in Perl. :) We can make it even simpler, however. As you probably know, Perl has two fundamental types of collections: arrays (where data is stored as a sequence of elements, data chunks) and hashes (where data chunks are unordered, but stored with some unique key used to retrieve it). Sometimes hashes are used just to sort out (non-)unique data, but that's another story. Now look at this line: push @{$files{$filesize}}, $File::Find::name; Don't you see something... weird? You're using hash where filesizes are the keys - and because, yes, they may well be non-unique, you have to store arrays of filenames in your hash instead... But much more natural (at least, for me) is to organize your hash (let's call it %filedata) so that filenames (which are unique by their nature) become the keys. And some info about these files - sizes and md5-hashes - become the values. Yes, file names in a given directory _have_ to be unique, however... For example, our `wanted` (btw, its name is misleading a bit, no? may be 'process' will sound better?) sub may look as follows: find(\wanted, $path); my %filedata; sub wanted { return if substr($_, 0, 1) eq '.' || -d $_; my $filesize = -s _; open my $fh, '', $_ or die $!, $/; my $filemd5 = Digest::MD5-new-addfile($fh)-hexdigest; close $fh; $filedata{$_} = [$filesize, $filemd5]; You are traversing a directory tree, so using $_ as the key may cause collisions across different directories. Better to use $File::Find::name which contains the full absolute path name. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help - Working Version
On Thu, Dec 29, 2011 at 5:08 PM, Igor Dovgiy ivd.pri...@gmail.com wrote: Hi Jonathan, Let's review your script a bit, shall we? ) It's definitely good for a starter, but still has some rough places. #!/usr/bin/perl # md5-test.plx use warnings; use strict; use File::Find; use Digest::MD5; use File::Spec; So far, so good. ) my $dir = shift || '/Users/jonharris/Documents/begperl'; Nice touch, setting up a default param. ) The name of variable might seem too generic to some, but then again, it's the only directory we deal with, so... my ($dircontents, $path, @directory, $fh, $wr_fh); Incoming! Well, it's usually better to declare your variables right before you'll really need them... Your script is short, so you'll hardly have a chance to forget what $fh and $wr_fh mean, though. ) @directory = $dir; $path = File::Spec-catfile( @directory, $dircontents ); Ahem. At least three 'wtf' moments for me. ) First of all, File::Spec-catfile is really just a glorified join operator with some additional operations depending on which system you're using. So, second, it makes a little sense to convert $dir into @directory (documentation example is just that, an example) and to pass there undefined $dircontents as well. But the major one is why you'd ever have to pass your $dir through File::Spec? It's, you know, user input... ) opendir (my $in, $path) or die Cannot open $dir: $!\n; So you're trying to open $path, but warn about failure to open $dir? ) But then again, that's a minor quarrel, considering this: find (\wanted, $path); See, File::Find is convenient method which _emulates_ the whole opendir-readdir-closedir pattern for a given directory. The 'wanted' subroutine (passed by ref) will be called for each file found in $path. It's described really well in perldoc (perldoc File::Find). close $in; Opendir, but close - and not closedir? Now I'm confused. ) opendir (my $totalin, $path) or die Cannot open $dir: $!\n; find (\cleanup, $path); close $totalin; You don't have to use different variable to store temporary file handle (which will be closed in three lines) - and that will save you a few moments spent working out a new (but rational) name for it. :) But then again, you don't need to open the same dir twice: you can call cleanup (with the same 'find (\cleanup)... ' syntax) whenever you want. And you don't really need cleanup... whoops, going too far too soon. :) print \n\nAll Done!\n\n; sub wanted { while ($dircontents = readdir($in)) { Did I say that you're using two alternative methods of doing the same thing? ) But there's another big 'no-no' here: you're using external variable ($dircontents) when you really have perfectly zero reasons to do so. Of course, you don't need to push dirhandle ($in) from outer scape into this sub, when using find... ($File::Find::dir will do), but that's explainable at least. ) if ($dircontents=~/^\./ || -d $dircontents) { next; } So now the script ignores all the files which names begin with '.', and you really wanted just to ignore '.' and '..' ... ) my $bytes = -s $dircontents; print $dircontents, \n; print $bytes, bytes, \tSo far so good!\n; Yes. ) open $fh, , $dircontents or die $!; open $wr_fh, +, $path $dircontents.md5 or die $!; ## was unable to concatenate here, hence sub cleanup to remove the ' ' What was wrong with ... open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ ? my $hex = Digest::MD5-new-addfile($fh)-hex digest; print Hex Digest: , $hex, \n\n; print $wr_fh $hex, \n, $bytes, \n\n; All looks great for now: you're calculating md5 and size, and writing it into file with md5 extension... return($hex); ... but now you're abruptly jumping out of the while block, making the whole point of cycle, well, completely pointless. Not great. close $wr_fh; close $fh; } } # The following is mostly not original code - thanks to the author! sub cleanup { my @filelist = readdir($totalin); foreach my $oldname (@filelist) { next if -d $oldname; my $newname = $oldname; $newname =~ s/\ //; So you don't have spaces in your filenames. Great for you. ) rename $oldname, $newname; } } # End # And here we finish. Computers are not smart. They're dumb. But they're fast. And obedient. ) That's why they're really helpful in letting you do what you're trying to do... but only if you KNOW what you're trying to do. Imagine that you're - and not your computer - will be doing this task. Sit in one place - and write down your program as you (and not your computer) will be running it. Step by step. Bit by bit. Then convert your notes into some Perl form - and you'll instantly see the difference between now and then. ) -- iD Hi Igor Many thanks for your response I have started reviewing the things you said There are some silly mistakes in there - eg not using closedir It's a good
Re: File Size Script Help - Working Version
Hi Jonathan, Let's review your script a bit, shall we? ) It's definitely good for a starter, but still has some rough places. #!/usr/bin/perl # md5-test.plx use warnings; use strict; use File::Find; use Digest::MD5; use File::Spec; So far, so good. ) my $dir = shift || '/Users/jonharris/Documents/begperl'; Nice touch, setting up a default param. ) The name of variable might seem too generic to some, but then again, it's the only directory we deal with, so... my ($dircontents, $path, @directory, $fh, $wr_fh); Incoming! Well, it's usually better to declare your variables right before you'll really need them... Your script is short, so you'll hardly have a chance to forget what $fh and $wr_fh mean, though. ) @directory = $dir; $path = File::Spec-catfile( @directory, $dircontents ); Ahem. At least three 'wtf' moments for me. ) First of all, File::Spec-catfile is really just a glorified join operator with some additional operations depending on which system you're using. So, second, it makes a little sense to convert $dir into @directory (documentation example is just that, an example) and to pass there undefined $dircontents as well. But the major one is why you'd ever have to pass your $dir through File::Spec? It's, you know, user input... ) opendir (my $in, $path) or die Cannot open $dir: $!\n; So you're trying to open $path, but warn about failure to open $dir? ) But then again, that's a minor quarrel, considering this: find (\wanted, $path); See, File::Find is convenient method which _emulates_ the whole opendir-readdir-closedir pattern for a given directory. The 'wanted' subroutine (passed by ref) will be called for each file found in $path. It's described really well in perldoc (perldoc File::Find). close $in; Opendir, but close - and not closedir? Now I'm confused. ) opendir (my $totalin, $path) or die Cannot open $dir: $!\n; find (\cleanup, $path); close $totalin; You don't have to use different variable to store temporary file handle (which will be closed in three lines) - and that will save you a few moments spent working out a new (but rational) name for it. :) But then again, you don't need to open the same dir twice: you can call cleanup (with the same 'find (\cleanup)... ' syntax) whenever you want. And you don't really need cleanup... whoops, going too far too soon. :) print \n\nAll Done!\n\n; sub wanted { while ($dircontents = readdir($in)) { Did I say that you're using two alternative methods of doing the same thing? ) But there's another big 'no-no' here: you're using external variable ($dircontents) when you really have perfectly zero reasons to do so. Of course, you don't need to push dirhandle ($in) from outer scape into this sub, when using find... ($File::Find::dir will do), but that's explainable at least. ) if ($dircontents=~/^\./ || -d $dircontents) { next; } So now the script ignores all the files which names begin with '.', and you really wanted just to ignore '.' and '..' ... ) my $bytes = -s $dircontents; print $dircontents, \n; print $bytes, bytes, \tSo far so good!\n; Yes. ) open $fh, , $dircontents or die $!; open $wr_fh, +, $path $dircontents.md5 or die $!; ## was unable to concatenate here, hence sub cleanup to remove the ' ' What was wrong with ... open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ ? my $hex = Digest::MD5-new-addfile($fh)-hex digest; print Hex Digest: , $hex, \n\n; print $wr_fh $hex, \n, $bytes, \n\n; All looks great for now: you're calculating md5 and size, and writing it into file with md5 extension... return($hex); ... but now you're abruptly jumping out of the while block, making the whole point of cycle, well, completely pointless. Not great. close $wr_fh; close $fh; } } # The following is mostly not original code - thanks to the author! sub cleanup { my @filelist = readdir($totalin); foreach my $oldname (@filelist) { next if -d $oldname; my $newname = $oldname; $newname =~ s/\ //; So you don't have spaces in your filenames. Great for you. ) rename $oldname, $newname; } } # End # And here we finish. Computers are not smart. They're dumb. But they're fast. And obedient. ) That's why they're really helpful in letting you do what you're trying to do... but only if you KNOW what you're trying to do. Imagine that you're - and not your computer - will be doing this task. Sit in one place - and write down your program as you (and not your computer) will be running it. Step by step. Bit by bit. Then convert your notes into some Perl form - and you'll instantly see the difference between now and then. ) -- iD
re: File Size Script Help - Working Version
Hi All Firstly, many thanks for your help previously (19/12/11) - it has led to making a useable script I don't think it's brilliantly written, it seems a little bodged together to me... but works fine - not a bad result for a first script If you are new to this problem and are interested in the previous thread, I have attached it for you as a text file I have done everything I can think of now to follow the previous advice The script is portable, skips directories, creates digests of files, uses better Perl practice (e.g., no more barewords, correct lexical file handles etc) I only have a couple of questions left - wondering if you can help : - One thing that was recommended was to ensure that the File Handles are opened outside of the loop - I really can't figure out how to do this and keep the program working! Doesn't it need to be inside the loop to be iterated over? - Also, I had to open two file handles to get addfile()-hexdigest to work so that the value could be passed - this can't be correct?! - Writing back was messy - I was struggling with concatenating variables to keep the script portable - Is it possible to put values into a hash, and then print each hash entry to a separate file? There are clearly better ways to achieve this result - all suggestions are gratefully received! Thanks again Jon Here's the script: # # This program reads in files from a directory, produces a hex digest and writes the hex, along with # the file size into a newly created file with the same name and a '.md5' extension, to the original directory #!/usr/bin/perl # md5-test.plx use warnings; use strict; use File::Find; use Digest::MD5; use File::Spec; my $dir = shift || '/Users/jonharris/Documents/begperl'; my ($dircontents, $path, @directory, $fh, $wr_fh); @directory = $dir; $path = File::Spec-catfile( @directory, $dircontents ); opendir (my $in, $path) or die Cannot open $dir: $!\n; find (\wanted, $path); close $in; opendir (my $totalin, $path) or die Cannot open $dir: $!\n; find (\cleanup, $path); close $totalin; print \n\nAll Done!\n\n; sub wanted { while ($dircontents = readdir($in)) { if ($dircontents=~/^\./ || -d $dircontents) { next; } my $bytes = -s $dircontents; print $dircontents, \n; print $bytes, bytes, \tSo far so good!\n; open $fh, , $dircontents or die $!; open $wr_fh, +, $path $dircontents.md5 or die $!; ## was unable to concatenate here, hence sub cleanup to remove the ' ' my $hex = Digest::MD5-new-addfile($fh)-hex digest; print Hex Digest: , $hex, \n\n; print $wr_fh $hex, \n, $bytes, \n\n; return($hex); close $wr_fh; close $fh; } } # The following is mostly not original code - thanks to the author! sub cleanup { my @filelist = readdir($totalin); foreach my $oldname (@filelist) { next if -d $oldname; my $newname = $oldname; $newname =~ s/\ //; rename $oldname, $newname; } } # End # Jonathan Harris Dec 19 (10 days ago) to beginners, me Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; use Digest::MD5; my $filesize = 0; my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; my $md5 = Digest::MD5-new; while ($_ = readdir(DH)) { $md5-add($_); $filesize = (stat(DH))[7]; Is it necessary to put the following into a new loop? foreach ($_) { open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; binmode(FH); print FH $md5-hexdigest, \n, $filesize; } close FH; } close DH; print \n$dir\n\n; ### Jim Gibson via perl.org Dec 19 (10 days ago) to beginners On 12/19/11 Mon Dec 19, 2011 11:32 AM, Jonathan Harris jtnhar...@googlemail.com scribbled: Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning
Re: File Size Script Help - Working Version
Jonathan Harris wrote: Hi Igor Many thanks for your response I have started reviewing the things you said There are some silly mistakes in there - eg not using closedir It's a good lesson in script vigilance I found the part about opening the file handle particularly interesting I had no idea that open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ was possible Now it's time to sit down and digest all this..and rewrite the script to make it better! Igor made a lot of good points. Here are my two cents worth. You are using the File::Find module to traverse the file system and add new files along the way. This _may_ cause problems on some file systems. It would probably be better to get a list of applicable files first and then use that list to create your new files. And you should have some way to handle the situation where a file exists that already has an '.md5' file or an '.md5' file exists with no corresponding plain file. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help - Working Version
On Thu, Dec 29, 2011 at 6:39 PM, John W. Krahn jwkr...@shaw.ca wrote: Jonathan Harris wrote: Hi Igor Many thanks for your response I have started reviewing the things you said There are some silly mistakes in there - eg not using closedir It's a good lesson in script vigilance I found the part about opening the file handle particularly interesting I had no idea that open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ was possible Now it's time to sit down and digest all this..and rewrite the script to make it better! Igor made a lot of good points. Here are my two cents worth. You are using the File::Find module to traverse the file system and add new files along the way. This _may_ cause problems on some file systems. It would probably be better to get a list of applicable files first and then use that list to create your new files. And you should have some way to handle the situation where a file exists that already has an '.md5' file or an '.md5' file exists with no corresponding plain file. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi John Thanks for your 2 cents I hadn't considered that the module wouldn't be portable If that is the case, then maybe it would be best to ditch File::Find altogether? Have you had experience with the module causing issues on certain systems? It would be a shame, as I've just got it working! - Thanks to Igor, I no longer use the unnecessary dir handles! I agree that it may be worth examining the directory for existing .md5 files and skipping them I'll look in to adding that in to the code All the best and thanks for your help Jonathan
Re: File Size Script Help - Working Version
On Fri, Dec 30, 2011 at 12:33 AM, Jonathan Harris jtnhar...@googlemail.comwrote: On Thu, Dec 29, 2011 at 6:39 PM, John W. Krahn jwkr...@shaw.ca wrote: Jonathan Harris wrote: Hi Igor Many thanks for your response I have started reviewing the things you said There are some silly mistakes in there - eg not using closedir It's a good lesson in script vigilance I found the part about opening the file handle particularly interesting I had no idea that open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ was possible Now it's time to sit down and digest all this..and rewrite the script to make it better! Igor made a lot of good points. Here are my two cents worth. You are using the File::Find module to traverse the file system and add new files along the way. This _may_ cause problems on some file systems. It would probably be better to get a list of applicable files first and then use that list to create your new files. And you should have some way to handle the situation where a file exists that already has an '.md5' file or an '.md5' file exists with no corresponding plain file. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi John Thanks for your 2 cents I hadn't considered that the module wouldn't be portable If that is the case, then maybe it would be best to ditch File::Find altogether? Have you had experience with the module causing issues on certain systems? It would be a shame, as I've just got it working! - Thanks to Igor, I no longer use the unnecessary dir handles! I agree that it may be worth examining the directory for existing .md5 files and skipping them I'll look in to adding that in to the code All the best and thanks for your help Jonathan Hi All Final question for Igor I tried to use your suggestion open my $wr_fh, '', File::Spec-catfile($path, $dircontents, '.md5') or die $!, $/ but it returned an error on the command line: 'Not a directory' At which point the program dies (which is what it is supposed to do!) I used it inside the loop - sorry to bug you for clarification if ($dircontents=~/^\./ || -d $dircontents) { next; } This is also to avoid the file .DS_Store # FInally, I was advised by a C programmer to declare all variables at the start of a program to avoid memory issues Is this not necessary in Perl? The rest of it is going really well - hope to post new and improved code soon!
Re: File Size Script Help - Working Version
Jonathan Harris wrote: On Thu, Dec 29, 2011 at 6:39 PM, John W. Krahnjwkr...@shaw.ca wrote: Igor made a lot of good points. Here are my two cents worth. You are using the File::Find module to traverse the file system and add new files along the way. This _may_ cause problems on some file systems. It would probably be better to get a list of applicable files first and then use that list to create your new files. And you should have some way to handle the situation where a file exists that already has an '.md5' file or an '.md5' file exists with no corresponding plain file. Hi John Thanks for your 2 cents I hadn't considered that the module wouldn't be portable That is not what I was implying. I was saying that when you add new files to a directory that you are traversing you _may_ get irregular results. It depends on how your operating system updates directory entries. If that is the case, then maybe it would be best to ditch File::Find altogether? Have you had experience with the module causing issues on certain systems? It would be a shame, as I've just got it working! - Thanks to Igor, I no longer use the unnecessary dir handles! I agree that it may be worth examining the directory for existing .md5 files and skipping them I'll look in to adding that in to the code John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help - Working Version
Jonathan Harris wrote: FInally, I was advised by a C programmer to declare all variables at the start of a program to avoid memory issues Is this not necessary in Perl? It is not really necessary in C either. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
re: File Size Script Help
Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; use Digest::MD5; my $filesize = 0; my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; my $md5 = Digest::MD5-new; while ($_ = readdir(DH)) { $md5-add($_); $filesize = (stat(DH))[7]; Is it necessary to put the following into a new loop? foreach ($_) { open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; binmode(FH); print FH $md5-hexdigest, \n, $filesize; } close FH; } close DH; print \n$dir\n\n; ###
Re: File Size Script Help
On 12/19/11 Mon Dec 19, 2011 11:32 AM, Jonathan Harris jtnhar...@googlemail.com scribbled: Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum A hash of what? File names (directory contents) or file contents? 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension Same name as the file or same name as the directory? 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) Will this file contain information for one file or many files? I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file Print out the file size returned by stat. Check if it is the same displayed by the ls command. I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! I am afraid I do not understand exactly what you are trying to accomplish. I can't tell from your program whether or not you will end up with one digest file for the entire directory, or one digest file for each file in the directory. Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; use Digest::MD5; my $filesize = 0; You should declare variables where they are needed and not before. my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; my $md5 = Digest::MD5-new; while ($_ = readdir(DH)) { You are better off using a scalar variable and not the default variable, which can get reused and overwritten: while( my $file = readdir(DH) ) { $md5-add($_); You are adding file names to a string to be digested. Is that what you want? Or do you want to calculate a digest for the contents of each file? I have not used Digest::MD5, but if you to calculate the digest for the contents of each file, you want to create a new digest object, open the file, and use the addfile() method, then hexdigest() for each file. $filesize = (stat(DH))[7]; You are applying stat to the directory read handle. You want to fetch data for the file (untested): my $filesize = (stat($dir/$file))[7]; Note that you must prefix the file name with its full path. Is it necessary to put the following into a new loop? No. It makes no sense to have a one-iteration loop. foreach ($_) { open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; You are appending lines to a file with a name that is based on an existing file. Why? binmode(FH); There is no need to set the mode of the output file to binary. Both hexdigest and the file size will be written in ascii characters, not binary data. print FH $md5-hexdigest, \n, $filesize; } close FH; } close DH; print \n$dir\n\n; ### -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help
Hi Jonathan, some comments on your code - both positive and negative. On Mon, 19 Dec 2011 19:32:10 + Jonathan Harris jtnhar...@googlemail.com wrote: Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; strict and warnings are good. use Digest::MD5; So is using a module. my $filesize = 0; You shouldn't predeclare your variables. my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; Don't use bareword dir handles - use lexical ones. It's good that you're using the or die thing, though. my $md5 = Digest::MD5-new; Seems like you're using the same $md5 object times and again which will calculate cumulative MD5 sums instead of per-file ones. while ($_ = readdir(DH)) { 1. You're not skipping . and ... 2. You're not skipping other directories. 3. The $_ variable can be easily devastated. You should use a lexical one. $md5-add($_); According to http://metacpan.org/module/Digest::MD5 the add() methods adds data, and here it will only add the filename. You need to use addfile() with an opened file handle instead. $filesize = (stat(DH))[7]; You shouldn't stat the directory handle. Instead stat $dir/$filename (you can also use the core File::Spec module if you want to make it extra portable). Is it necessary to put the following into a new loop? foreach ($_) { Why the foreach ($_) here? It does nothing. You're already iterating on the files. open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; binmode(FH); print FH $md5-hexdigest, \n, $filesize; } close FH; Use lexical file handles here, use three-args open - «open my $fh, '', '/Users'» and don't open it time and again. Keep it outside the loop. For more information see: http://perl-begin.org/tutorials/bad-elements/ Regards, Shlomi Fish } close DH; print \n$dir\n\n; ### -- - Shlomi Fish http://www.shlomifish.org/ Chuck Norris/etc. Facts - http://www.shlomifish.org/humour/bits/facts/ We don’t know his cellphone number, and even if we did, we would tell you that we didn’t know it. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: File Size Script Help
On Mon, Dec 19, 2011 at 8:08 PM, Jim Gibson jimsgib...@gmail.com wrote: On 12/19/11 Mon Dec 19, 2011 11:32 AM, Jonathan Harris jtnhar...@googlemail.com scribbled: Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum A hash of what? File names (directory contents) or file contents? 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension Same name as the file or same name as the directory? 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) Will this file contain information for one file or many files? I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file Print out the file size returned by stat. Check if it is the same displayed by the ls command. I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! I am afraid I do not understand exactly what you are trying to accomplish. I can't tell from your program whether or not you will end up with one digest file for the entire directory, or one digest file for each file in the directory. Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; use Digest::MD5; my $filesize = 0; You should declare variables where they are needed and not before. my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; my $md5 = Digest::MD5-new; while ($_ = readdir(DH)) { You are better off using a scalar variable and not the default variable, which can get reused and overwritten: while( my $file = readdir(DH) ) { $md5-add($_); You are adding file names to a string to be digested. Is that what you want? Or do you want to calculate a digest for the contents of each file? I have not used Digest::MD5, but if you to calculate the digest for the contents of each file, you want to create a new digest object, open the file, and use the addfile() method, then hexdigest() for each file. $filesize = (stat(DH))[7]; You are applying stat to the directory read handle. You want to fetch data for the file (untested): my $filesize = (stat($dir/$file))[7]; Note that you must prefix the file name with its full path. Is it necessary to put the following into a new loop? No. It makes no sense to have a one-iteration loop. foreach ($_) { open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; You are appending lines to a file with a name that is based on an existing file. Why? binmode(FH); There is no need to set the mode of the output file to binary. Both hexdigest and the file size will be written in ascii characters, not binary data. print FH $md5-hexdigest, \n, $filesize; } close FH; } close DH; print \n$dir\n\n; ### -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi Jim Thanks for responding Here are some answers to your questions: A hash of what? File names (directory contents) or file contents? The aim is to produce a digest for each file in the folder Same name as the file or same name as the directory? Each file created will have the same filename as the original, but with the extension '.md5' Will this file contain information for one file or many files? This file will contain information for only one file This is so that when looking at the treated directory, the end result will be: some_movie.mov some_movie.mov.md5 another_movie.mov another_movie.mov.md5 etc... Print out the file size returned by stat. Check if it is the same displayed by the ls command. I really have! However, this is where I am coming unstuck I am afraid I do not understand exactly what you are trying to accomplish. I can't tell from your program whether or not you will end up with one digest file for the entire directory, or one digest file for each file in the directory. The program creates one digest file for each file in the directory : ) my $filesize = 0; You should declare variables where they are needed and not before. Noted my $md5 = Digest::MD5-new; while ($_ = readdir(DH)) { You are better off using a scalar variable and not the default variable
Re: File Size Script Help
On Mon, Dec 19, 2011 at 8:09 PM, Shlomi Fish shlo...@shlomifish.org wrote: Hi Jonathan, some comments on your code - both positive and negative. On Mon, 19 Dec 2011 19:32:10 + Jonathan Harris jtnhar...@googlemail.com wrote: Hi Perl Pros This is my first call for help I am a totally new, self teaching, Perl hopeful If my approach to this script is simply wrong, please let me know as it will help my learning! The script aims to: 1) Read in a directory either from the command line, or from a default path 2) Produce a hash for future checksum 3) Write this (hex digest) to a separate file, in a sub directory of the parent, which has the same name and a .md5 extension 4) Check the original file for its size 5) Add this data to the newly created file on a new line (in bytes) I have a script that will do most of this, except for analysing the file size - I think that the file size being analysed may be the md5 object result as the same value is printed to each file I am running out of ideas and would appreciate any help you could give! I have tried using File::stat::OO and File::stat - but to no avail - I could be using them incorrectly! Many thanks in advance... Jon. Here are some details: System: Mac OSX 10.7.2 Perl version 5.12 Script: #!/usr/bin/perl # md5-test-3.plx use warnings; use strict; strict and warnings are good. use Digest::MD5; So is using a module. my $filesize = 0; You shouldn't predeclare your variables. my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; Don't use bareword dir handles - use lexical ones. It's good that you're using the or die thing, though. my $md5 = Digest::MD5-new; Seems like you're using the same $md5 object times and again which will calculate cumulative MD5 sums instead of per-file ones. while ($_ = readdir(DH)) { 1. You're not skipping . and ... 2. You're not skipping other directories. 3. The $_ variable can be easily devastated. You should use a lexical one. $md5-add($_); According to http://metacpan.org/module/Digest::MD5 the add() methods adds data, and here it will only add the filename. You need to use addfile() with an opened file handle instead. $filesize = (stat(DH))[7]; You shouldn't stat the directory handle. Instead stat $dir/$filename (you can also use the core File::Spec module if you want to make it extra portable). Is it necessary to put the following into a new loop? foreach ($_) { Why the foreach ($_) here? It does nothing. You're already iterating on the files. open FH, /Users/jonharris/Documents/begperl/md5/$_.md5 or die $!; binmode(FH); print FH $md5-hexdigest, \n, $filesize; } close FH; Use lexical file handles here, use three-args open - «open my $fh, '', '/Users'» and don't open it time and again. Keep it outside the loop. For more information see: http://perl-begin.org/tutorials/bad-elements/ Regards, Shlomi Fish } close DH; print \n$dir\n\n; ### -- - Shlomi Fish http://www.shlomifish.org/ Chuck Norris/etc. Facts - http://www.shlomifish.org/humour/bits/facts/ We don’t know his cellphone number, and even if we did, we would tell you that we didn’t know it. Please reply to list if it's a mailing list post - http://shlom.in/reply . Hi Shlomi Thanks for your response To answer your questions: my $filesize = 0; You shouldn't predeclare your variables Noted, thanks my $dir = shift || '/Users/jonharris/Documents/begperl'; opendir (DH, $dir) or die Couldn't open directory: $!; Don't use bareword dir handles - use lexical ones. It's good that you're using the or die thing, though. Of course - will have to re-read the section on barewords... my $md5 = Digest::MD5-new; Seems like you're using the same $md5 object times and again which will calculate cumulative MD5 sums instead of per-file ones. In which case, I think that it will have to go in the loop so that a new instance is produced each time? while ($_ = readdir(DH)) { 1. You're not skipping . and ... 2. You're not skipping other directories. I'm sure I read somewhere, that the parent was automatically skipped Must be getting confused However, adding this will be ok: next if $_ eq . or $_ eq ..; 3. The $_ variable can be easily devastated. You should use a lexical one. I'll certainly use lexical in the future $md5-add($_); According to http://metacpan.org/module/Digest::MD5 the add() methods adds data, and here it will only add the filename. You need to use addfile() with an opened file handle instead. That makes sense $filesize = (stat(DH))[7]; You shouldn't stat the directory handle. Instead stat $dir/$filename (you can also use the core File::Spec module if you want to make
Query Online File Size
Hi, I am writing a small script to download files of the web. How can I get the file size without downloading the file? use LWP::Simple; my $file = http://www.abc.com/file.mp3; my @array = head($file); print $array[1]\n; head() doesn't always returns all values? why?? Sometime there are all values some time @array is empty! Should I try LWP::UserAgent or is there any other way?
Re: Query Online File Size
use LWP::UserAgent; sub GetFileSize{ my $url=shift; $ua = new LWP::UserAgent; $ua-agent(Mozilla/5.0); my $req = new HTTP::Request 'HEAD' = $url; $req-header('Accept' = 'text/html'); $res = $ua-request($req); if ($res-is_success) { my $headers = $res-headers; return $headers; } return 0; } $link='http://www.abc.com/file.mp3'; $header = GetFileSize($link); print File size: .$header-content_length. bytes\n; exit; On Thu, Nov 26, 2009 at 12:28 PM, raphael() raphael.j...@gmail.com wrote: Hi, I am writing a small script to download files of the web. How can I get the file size without downloading the file? use LWP::Simple; my $file = http://www.abc.com/file.mp3; my @array = head($file); print $array[1]\n; head() doesn't always returns all values? why?? Sometime there are all values some time @array is empty! Should I try LWP::UserAgent or is there any other way?
File Size Limit in Archive::Perl
Hi All, Is there any way to limit the file size while zipping using Archive::Zip so that it will stop processing a zip operation on a file list when it crosses the maximum file size. Thanks in advance. -A -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: File Size Limit in Archive::Perl
San wrote: Is there any way to limit the file size while zipping using Archive::Zip so that it will stop processing a zip operation on a file list when it crosses the maximum file size. Hey San Unfortunately Archive::Zip requires that an archive be written to disk before the compression is performed and the final size can be determined. But it is possible to compress files individually and write them to a temporary file to determine their size, and then add the same archive member to the final archive without repeating the compression. Take a look at the program below, which is written for a Windows system but should be easily portable. HTH, Rob use strict; use warnings; use Archive::Zip qw/AZ_OK/; use File::Temp qw/tempfile/; use constant MB = 1024 * 1024; my $dir = 'C:'; my @files = do { opendir my $fd, $dir\\ or die $! or die $!; grep -f, map $dir\\$_, readdir $fd; }; my $zip = Archive::Zip-new; my $total; my $limit = 50*MB; foreach my $file (@files) { my $temp = Archive::Zip-new; my $member = $temp-addFile($file); next unless $member-compressedSize; my $fh = tempfile(); $temp-writeToFileHandle($fh) == AZ_OK or die $!; $zip-addMember($member); $total += $member-compressedSize; die $total bytes exceeds archive size limit if $total $limit; } print Total archive size: $total bytes\n\n; $zip-writeToFileNamed('archive.zip') == AZ_OK or die $!; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
On Sat, 2007-01-20 at 09:31 +1100, Ken Foskey wrote: What's exactly the difference between: ++$lines; and $lines++; ? Nothing in this context. What about other contexts? David. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
David Moreno Garza am Sonntag, 21. Januar 2007 07:50: On Sat, 2007-01-20 at 09:31 +1100, Ken Foskey wrote: What's exactly the difference between: ++$lines and $lines++; ? Nothing in this context. What about other contexts? Hi David #!/usr/bin/perl use strict; use warnings; { # preincrement my (%h, $i); $h{++$i}='hi'; print keys %h, , $i\n; } { # postincrement my (%h, $i); $h{$i++}='hi'; print keys %h, , $i\n; } __END__ 1, 1 0, 1 The difference is the order of read current value (used as hash key value) and increment current value (done by ++ operator). There's no difference between standalone ++$lines and $lines++ because only increment takes place, and the result value is not used in the same expression. See also perldoc perlop, Auto-increment and Auto-decrement. Hope this helps! Dani -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop? - maybe HASH problem?
On 1/19/07, Bertrand Baesjou [EMAIL PROTECTED] wrote: While running my script it seems to use around a gigabyte of memory (there is 1GB of RAM and 1GB of swap in the system), might this be the problem? If you're running low on memory, unless you're working on an inherintly large problem, your algorithm is probably wasting some memory. foreach $line (INFILE) { $somestorage{$linecounter}=$value; $linecounter++; } Well, that builds a big hash for nothing. Unless you're trying to waste memory? print $linecounter; You should probably put a newline at the end of your output. system(pwd) == 0 or die system failed: $?; 5198365system failed: 0 at ./sample1.pl line 22. You're trying to run the command pwd, which seems to have failed. The value of $? is zero though, which would normally indicate success. Perhaps it's zero because the command couldn't be executed at all? (Maybe low memory?) Does the pwd command normally work from system()? (You're not comparing this pwd to the shell built-in, are you?) Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
maximum file size for while(FILE) loop?
Hi, I am trying to read data from a file, I do this by using the while (FILE){ $line} construction. However with files with a size of roughly bigger than 430MB it seems to crash the script :S Syntax seems all fine (perl -wc - syntax OK). I was thinking that maybe it was running to the end of a 32 bit counter (but that would be 536 MB right?)? Can anybody offer an other solution to work with such large files and perl? Tnx, Bertrand -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
On Fri, 2007-01-19 at 13:16 +0100, Bertrand Baesjou wrote: Hi, I am trying to read data from a file, I do this by using the while (FILE){ $line} construction. However with files with a size of roughly bigger than 430MB it seems to crash the script :S Syntax seems all fine (perl -wc - syntax OK). I was thinking that maybe it was running to the end of a 32 bit counter (but that would be 536 MB right?)? Can anybody offer an other solution to work with such large files and perl? No idea, a little script sample might be good. -- Ken Foskey FOSS developer -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
Bertrand Baesjou wrote: Hi, I am trying to read data from a file, I do this by using the while (FILE){ $line} construction. However with files with a size of roughly bigger than 430MB it seems to crash the script :S Syntax seems all fine (perl -wc - syntax OK). How does your script crash? What are the symptoms? I was thinking that maybe it was running to the end of a 32 bit counter (but that would be 536 MB right?)? Can anybody offer an other solution to work with such large files and perl? People have read files of several gigabytes with Perl. The problem is more likely to lie with what you do with the data once you have read it. To prove this for yourself, set this code against the same file: my $lines = 0; while (FILE) { ++$lines; } print $lines; and I am pretty sure that won't crash. Then try to simplify your code by removing stuff from the loop until the problem goes away. The last thing you removed contains the cause of the crash. If you need to post again please give us comprehensive details of the crash. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop? - maybe HASH problem?
Ken Foskey wrote: On Fri, 2007-01-19 at 13:16 +0100, Bertrand Baesjou wrote: Hi, I am trying to read data from a file, I do this by using the while (FILE){ $line} construction. However with files with a size of roughly bigger than 430MB it seems to crash the script :S Syntax seems all fine (perl -wc - syntax OK). I was thinking that maybe it was running to the end of a 32 bit counter (but that would be 536 MB right?)? Can anybody offer an other solution to work with such large files and perl? No idea, a little script sample might be good. While running my script it seems to use around a gigabyte of memory (there is 1GB of RAM and 1GB of swap in the system), might this be the problem? The script below gives the error: #!/usr/local/bin/perl ## # use POSIX; my $inputFile = $ARGV[0]; #file we are reading from my $outputFile = ./tmp/overall-memory.dat; my %somestorage; my $linecounter = 0; my $value=40; my $bool = 0; open(INFILE, $inputFile) or die(Could not open log file.);# open for input open(OUTFILE, $outputFile); foreach $line (INFILE) { $somestorage{$linecounter}=$value; $linecounter++; } close(INFILE); close(OUTFILE); print $linecounter; system(pwd) == 0 or die system failed: $?; #wc -l samples/1169209055.trcxml 5198365 samples/1169209055.trcxml # ./sample1.pl samples/1169209055.trcxml 5198365system failed: 0 at ./sample1.pl line 22. Any ideas how to solve my problem? Tnx! Bertrand -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop? - maybe HASH problem?
On Fri, Jan 19, 2007 at 03:17:19PM +0100, Bertrand Baesjou wrote: foreach $line (INFILE) { See, this isn't a while loop, as you have in the subject. That is the cause of your problems. -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
On Fri, 2007-01-19 at 13:24 +, Rob Dixon wrote: ++$lines; What's exactly the difference between: ++$lines; and $lines++; ? David. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
On Fri, 2007-01-19 at 16:21 -0600, David Moreno Garza wrote: On Fri, 2007-01-19 at 13:24 +, Rob Dixon wrote: ++$lines; What's exactly the difference between: ++$lines; and $lines++; ? Nothing in this context. It does make a difference if you are 'using' the value see sample and try it: $lines = 10; $value = ++$lines; print first value is it 10 or 11? $value\n $value = $lines++; print second value is it 11 or 12? $value\n -- Ken Foskey FOSS developer -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: maximum file size for while(FILE) loop?
David Moreno Garza wrote: On Fri, 2007-01-19 at 13:24 +, Rob Dixon wrote: ++$lines; What's exactly the difference between: ++$lines; and $lines++; ? In void context they are both the same because perl optimizes $lines++ to ++$lines. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
getting the File size of image URL
Is it possible to calculate the File SIZE which is from HTTP. i.e if I wanted to know file size of http://www.yahoo.com/images/a.gif from PERL.. Thanks Anish -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.408 / Virus Database: 268.13.2/471 - Release Date: 10/10/2006 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: getting the File size of image URL
Anish Kumar K. [EMAIL PROTECTED] asked: Is it possible to calculate the File SIZE which is from HTTP. i.e if I wanted to know file size of http://www.yahoo.com/images/a.gif from PERL.. Send a HEAD request for the URI and look at the Content-Length header of the response object: #!/usr/bin/perl -w use strict; use LWP::UserAgent; my $url = 'http://us.i1.yimg.com/us.yimg.com/i/yahoo.gif'; my $ua = new LWP::UserAgent(timeout = 60); my $response = $ua-head( $url ); if( $response-is_success ){ print 'The image size is ' . $response-header('Content-Length') . bytes.\n; } else { die 'The request was unsuccessful: ' . $response-status_line; } __END__ HTH, Thomas -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: getting the File size of image URL
From the $ENV{CONTENT_LENGTH} -Original Message- From: Anish Kumar K. [EMAIL PROTECTED] Sent: Oct 11, 2006 2:17 AM To: beginners@perl.org Subject: getting the File size of image URL Is it possible to calculate the File SIZE which is from HTTP. i.e if I wanted to know file size of http://www.yahoo.com/images/a.gif from PERL.. Thanks Anish -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.408 / Virus Database: 268.13.2/471 - Release Date: 10/10/2006 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response -- Jeff Pang NetEase AntiSpam Team http://corp.netease.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
Jose Alves De Castro wrote: On Mon, 2004-08-09 at 14:53, David Dorward wrote: On 9 Aug 2004, at 14:34, SilverFox wrote: Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? What have you got so far? Where are you stuck? Getting user input (where from)? Working out which order of magnitude the number is? I wouldn't do that (the part of finding the order of magnitude)... I would probably keep on doing calculations while the numbers was greater then 1024... and in the end, when it was, the right letter to append would be based on the amount of calculations done... I remember reading something about this on use.Perl ... it was a while ago, and I'm not sure whether it ever got into a module, but the guy had written some wonderful code to do this :-) Converting between kilo and mega et al? Showing the output? Show us some code. -- David Dorward http://dorward.me.uk/ http://blog.dorward.me.uk/ I haven't put anything together as yet. Putting some if/elsif statement together would be the easies way I can think off. Something like: $kilo= 1024; $Mega= 1048576; $gig= 1073741824; print Please enter your number:\n; chomp($num=STDIN); if ($num = $gig) { need code to do the convertion/rounding of given number print you entered: $num\n; print which is: } elsif { continue with the same format } The problem i'm having it converting/rounding the inputted number into a valid byte (KB/MB/GB) count. SilverFox -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
Quickest wya would be to get the left over from begining. ... print Please enter your number:\n; chomp($num=STDIN); $bytes = $num % $kilo; $num -= $bytes ... HTH, Mark G. - Original Message - From: SilverFox [EMAIL PROTECTED] Date: Monday, August 9, 2004 12:06 pm Subject: Re: File Size Calculator Jose Alves De Castro wrote: On Mon, 2004-08-09 at 14:53, David Dorward wrote: On 9 Aug 2004, at 14:34, SilverFox wrote: Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? What have you got so far? Where are you stuck? Getting user input (where from)? Working out which order of magnitude the number is? I wouldn't do that (the part of finding the order of magnitude)... I would probably keep on doing calculations while the numbers was greater then 1024... and in the end, when it was, the right letter to append would be based on the amount of calculations done... I remember reading something about this on use.Perl ... it was a while ago, and I'm not sure whether it ever got into a module, but the guy had written some wonderful code to do this :-) Converting between kilo and mega et al? Showing the output? Show us some code. -- David Dorward http://dorward.me.uk/ http://blog.dorward.me.uk/ I haven't put anything together as yet. Putting some if/elsif statement together would be the easies way I can think off. Something like: $kilo= 1024; $Mega= 1048576; $gig= 1073741824; print Please enter your number:\n; chomp($num=STDIN); if ($num = $gig) { need code to do the convertion/rounding of given number print you entered: $num\n; print which is: } elsif { continue with the same format } The problem i'm having it converting/rounding the inputted number into a valid byte (KB/MB/GB) count. SilverFox -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
And the clouds parted, and SilverFox said... Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? Example: user enter: 59443 Script will output: 58M SilverFox Here's a little chunk that should give you about what you're looking for, up to Tebibytes (2**40 bytes). Note that I used the binary prefixes[1] (Kibi, Mebi, Gibi, Tebi) as opposed to the base-10 versions (Kilo, Mega, Giga, Tera). Feel free to change them if you're so inclined. :) --- Begin Chunk --- our %ByteCount = ( B = 1, KiB = 2**10, MiB = 2**20, GiB = 2**30, TiB = 2**40 ); sub prettybyte { my $bytes = shift; foreach my $unit ( qw{ TiB GiB MiB KiB B } ) { if ($bytes = $ByteCount{$unit}) { return sprintf(%4.3f $unit, $bytes/$ByteCount{$unit}); } } } --- End Chunk --- HTH[2]- Brian [1] http://www.alcyone.com/max/reference/physics/binary.html -anyone remember offhand the URL to the /. story on these, btw? [2] I'm a little rushed at the moment, so I don't have time to fill in any details of how it works. Let me know if you want/need an explanation and I'll be happy to provide one. :) /~~\ | Brian Gerard Some drink at the fountain of| | First initial + 'lists' knowledge...others just gargle. | | at technobrat dot com | \__/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: File Size Calculator
SilverFox [EMAIL PROTECTED] wrote: : : I haven't put anything together as yet. Putting : some if/elsif statement together would be the : easiest way I can think off. Something like: We can see a few problems right off. All scripts should start with 'strict' and 'warnings'. We need a consistent naming convention for variables. If some start with capitalized letters, there should be a non-arbitrary reason for doing so. : $kilo= 1024; : $Mega= 1048576; : $gig= 1073741824; use strict; use warnings; my $kilobyte = 1024; my $megabyte = 1024 ** 2; my $gigabyte = 1024 ** 3; : print Please enter your number:\n; : chomp($num=STDIN); chomp( my $value = STDIN ); : if ($num = $gig) : { : need code to do the convertion/rounding : of given number print you entered: : $num\n; print which is: : } elsif { : continue with the same format : : } : : The problem i'm having it converting/rounding the : inputted number into a valid byte (KB/MB/GB) count. I suppose that takes some basic math. To round to a whole number, we examine the fraction to determine whether we should adjust the whole number up or down. It is important to separate the number from the fraction. Luckily, Math::Round has the nearest() function. To find the nearest number of units, we use this. nearest( $unit, $number ) / $unit Here's one solution. It's very generic. One could easily adopt it for miles, yards, and feet given a value in feet. I leave error checking to you. use strict; use warnings; use Math::Round 'nearest'; print Please enter your number:\n; chomp( my $value = STDIN ); my %units = ( 1024 = 'KB', 1024 ** 2 = 'MB', 1024 ** 3 = 'GB', ); foreach my $unit ( sort {$b = $a} keys %units ) { if ( $value = $unit ) { printf %s = %s %s\n, $value, nearest( $unit, $value ) / $unit, $units{ $unit }; last; } } __END__ We still need to handle values smaller than 1024, but this solution might make that easier to do. It won't handle non-positive values, though. my %units = ( 1024 ** 0 = 'Bytes', 1024 ** 1 = 'KB', 1024 ** 2 = 'MB', 1024 ** 3 = 'GB', ); HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
And the clouds parted, and Brian Gerard said... [1] http://www.alcyone.com/max/reference/physics/binary.html -anyone remember offhand the URL to the /. story on these, btw? ...never mind. Found it. (uncaught typo on my first google query... DOH!) http://slashdot.org/articles/99/08/10/0259245.shtml /~~\ | Brian Gerard Give me liberty or give me something | | First initial + 'lists' of equal or lesser value | | at technobrat dot com from your glossy 32-page catalog. | \__/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
File Size Calculator
Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? Example: user enter: 59443 Script will output: 58M SilverFox -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
On 9 Aug 2004, at 14:34, SilverFox wrote: Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? What have you got so far? Where are you stuck? Getting user input (where from)? Working out which order of magnitude the number is? Converting between kilo and mega et al? Showing the output? Show us some code. -- David Dorward http://dorward.me.uk/ http://blog.dorward.me.uk/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: File Size Calculator
On Mon, 2004-08-09 at 14:53, David Dorward wrote: On 9 Aug 2004, at 14:34, SilverFox wrote: Hi all, I'm trying to writing a script that will allow a user to enter a number and that number will be converted into KB,MB or GB depending on the size of the number. Can someone point me in the right direction? What have you got so far? Where are you stuck? Getting user input (where from)? Working out which order of magnitude the number is? I wouldn't do that (the part of finding the order of magnitude)... I would probably keep on doing calculations while the numbers was greater then 1024... and in the end, when it was, the right letter to append would be based on the amount of calculations done... I remember reading something about this on use.Perl ... it was a while ago, and I'm not sure whether it ever got into a module, but the guy had written some wonderful code to do this :-) Converting between kilo and mega et al? Showing the output? Show us some code. -- David Dorward http://dorward.me.uk/ http://blog.dorward.me.uk/ -- José Alves de Castro [EMAIL PROTECTED] http://natura.di.uminho.pt/~jac signature.asc Description: This is a digitally signed message part
Re: File Size Calculator
On Mon, 9 Aug 2004, SilverFox wrote: Example: user enter: 59443 Script will output: 58M I know this isn't getting into the spirit of things, but have you considered simply using the `units` program? % units 500 units, 54 prefixes You have: 59443 bytes You want: megabytes * 0.056689262 / 17.640025 You have: 59443 bytes You want: kilobytes * 59.443 / 0.016822839 You have: ^C % units bytes kilobytes * 0.001 / 1000 % units bytes megabytes * 9.5367432e-07 / 1048576 The nice thing about `units` -- in this context -- is that it lets the user pick the conversion units they want to work with, and also gives hints for converting both to from the alternate measurement scale. Of course, working this into a larger program that does other things might be annoying -- in which case your way is better -- but if all you want is the conversions, this is a solved problem :-) -- Chris Devers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Get file size without downloading
Hi, Is there any way to get the size of a file without downloading it? I want to write a program using LWP to download a file only if it is bigger than 3K but smaller than 500K. So I need to know the file size in the first place. Thank you. -u -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
usef wrote: Hi, Is there any way to get the size of a file without downloading it? I want to write a program using LWP to download a file only if it is bigger than 3K but smaller than 500K. So I need to know the file size in the first place. You issue a HEAD request to the server and look at the Content-Length response header. You can use the LWP::Simple module's head method to get this information easily. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Get file size without downloading
Hi, FTP or HTTP? HTTP, but I want to know the method for FTP as well. Thanks -u PS. Sorry Rus for multiple copy *smacks forehead* -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
usef wrote: Hi, FTP or HTTP? HTTP, but I want to know the method for FTP as well. Thanks -u I think that will work for FTP as well. Give it a try. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Get file size without downloading
On Wed, 10 Dec 2003, usef wrote: Hi, Is there any way to get the size of a file without downloading it? I want to write a program using LWP to download a file only if it is bigger than 3K but smaller than 500K. So I need to know the file size in the first place. Hi, FTP or HTTP? Rgds Rus -- e: [EMAIL PROTECTED] | Linux + FreeBSD Servers from $12.50/mo e: [EMAIL PROTECTED] | Full Root Access m: +44 7919 373537 | Free Trial Account t: 1-888-327-6330 | http://www.jvds.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
Hello, usef [EMAIL PROTECTED] asked: Is there any way to get the size of a file without downloading it? I want to write a program using LWP to download a file only if it is bigger than 3K but smaller than 500K. So I need to know the file size in the first place. Try making a HEAD request - that should return file size and last modification date. This obviously will not work for CGI URLs. HTH, Thomas -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
Is there any way to get the size of a file without downloading it? I want to write a program using LWP to download a file only if it is bigger than 3K but smaller than 500K. So I need to know the file size in the first place. Try making a HEAD request - that should return file size and last modification date. This obviously will not work for CGI URLs. Something like: my $ua = LWP::UserAgent-new(); my $result = $ua-head($url); my $remote_headers = $result-headers; $total_size = $remote_headers-content_length; -- Morbus Iff ( i put the demon back in codemonkey ) Culture: http://www.disobey.com/ and http://www.gamegrene.com/ My book, Spidering Hacks: http://amazon.com/exec/obidos/ASIN/0596005776/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
On Wed, 2003-12-10 at 09:42, Bob Showalter wrote: usef wrote: Hi, FTP or HTTP? HTTP, but I want to know the method for FTP as well. Thanks -u I think that will work for FTP as well. Give it a try. If I type ls when I FTP into somewhere I get a listing of files and size. I would guess either: a) ls is a command in FTP b) there is a corresponding command in the FTP module you are using. -Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: Get file size without downloading
Dan Anderson [EMAIL PROTECTED] wrote: On Wed, 2003-12-10 at 09:42, Bob Showalter wrote: usef wrote: Hi, FTP or HTTP? HTTP, but I want to know the method for FTP as well. Thanks -u I think that will work for FTP as well. Give it a try. If I type ls when I FTP into somewhere I get a listing of files and size. I would guess either: a) ls is a command in FTP b) there is a corresponding command in the FTP module you are using. In standard ftp, you can do 'size'. The Net::FTP for perl has a corresponding command also. When you are in standard ftp, just type ftp size yourFile and it will return something like ftp size yourFile 213 12523 where 213 is the message number and (in my case) 12523 is the size of the file. -Jeff __ Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing. http://photos.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Reduce file size with Imager
R. Joseph Newton wrote: Eamon Daly wrote: Hi, all. I'm using Imager to create gifs, but the resultant file sizes are /huge/. I'm writing the files out like so: Are you doing animations? If not, skip the GIFs. You can get much better depth [16 million] in a lot less space with JPEG files. Some of the compression algorithms avaiable are loss-free, too. No matter how small the color table, each pixel is still going to take its one byte when using GIF. I see that you set a gif_eliminate_unused flag, but I am sort of sceptical about how effective this will really be. I have never heard of a GIF making such optimizations. Joseph I'm sorry, but there are numerous errors here. First of all, I know nothing about this Imager package (and indeed, very little about Perl), but I suspect the other reply was correct. Check the documentation and you will probably find that the package simply does not do compression of GIF images. It will write uncompressed GIFs, but not compressed ones. The reason is that GIF uses the LZW compression algorithm, which is patented in some countries (the patent only recently expired in the US), and threats of royalty fees and legal action have caused MANY free software packages to stop supporting GIF compression. Second, JPEG (actually JFIF) is a very poor replacement for GIF. JPEG compression is good for photographs and other realistic images, but not for icons, cartoons, and other things that GIF works well for (things that have relatively few colors and/or large blocks of the same color). When attempting to compress a cartoon, for example, you'll find that JPEG/JFIF will give *lower* quality and a *larger* file size than GIF. For this type of image, PNG-8 would be a better choice than GIF, and a much better choice than JPEG/JFIF. Third, only in relatively bad cases will GIF require a byte for every pixel. For example, I just created a solid white 200 by 200 image. That's 40,000 pixels. The file size is 345 bytes. One byte per pixel is what you would get if no compression was used at all (probably what happened in this case, but not what happens in general), or if the compression performed so badly that it might as well have not been used (which is rare for typical images). -Kevin -- My email address is valid, but changes periodically. To contact me please use the address from a recent posting. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Reduce file size with Imager
Kevin Goodsell wrote: Third, only in relatively bad cases will GIF require a byte for every pixel. For example, I just created a solid white 200 by 200 image. That's 40,000 pixels. The file size is 345 bytes. One byte per pixel is what you would get if no compression was used at all (probably what happened in this case, but not what happens in general), or if the compression performed so badly that it might as well have not been used (which is rare for typical images). -Kevin Seriously? I guess I was going by what I have seen in full-color images. I may have dismissed the GIF protocol too quickly, when I was doing a lot of graphics work. I'll take another look at it. I notice now that I can easily raise information on the format through Google, which wasn't really the case when I last looked for background on graphics encoding. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Reduce file size with Imager
Kevin Goodsell wrote: Third, only in relatively bad cases will GIF require a byte for every pixel. For example, I just created a solid white 200 by 200 image. That's 40,000 pixels. The file size is 345 bytes. One byte per pixel is what you would get if no compression was used at all (probably what happened in this case, but not what happens in general), or if the compression performed so badly that it might as well have not been used (which is rare for typical images). -Kevin Thanks again for the correction. It has spurred some new exploration. I've been looking at the published standard on the format, and it is not at all like I had assumed. I'm afraid I was lumping it in with BMP and TIFF. Anyway, I am starting to untangle the coding: Greetings! E:\d_drive\perlStuffperl -w open IN, 'fullhead.gif'; binmode IN; local $/; my $img = IN; my @bytes = split //, $img; my $gif_type; for (1..6) { $gif_type .= shift @bytes; } print $gif_type\n; my $width = ord(shift @bytes); $width += 256 * ord(shift @bytes); my $height = ord(shift @bytes); $height += 256 * ord(shift @bytes); print Width: $width Height: $height\n; my $control_string = ord (shift @bytes); my $is_map = $control_string / 128; $control_string %= 128; my $bit_resolution = int(($control_string / 16) + 1); $control_string %= 16; $control_string %= 2; my $bits_per_pixel = $control_string; my $background_color = ord(shift @bytes); print Background is $background_color\n; my $color_map = ord(shift @bytes); print Color map is $color_map\n; my @colors; for (my $i = 0; $i 2 ** $bit_resolution; $i++) { my $color_channels = {}; $color_channels-{'red'} = ord(shift @bytes); $color_channels-{'green'} = ord(shift @bytes); $color_channels-{'blue'} = ord(shift @bytes); push @colors, $color_channels; print 'R: ', sprintf (%03d, $color_channels-{'red'}), ' G: ', sprintf (%03d, $color_channels-{'green'}), ' B: ', sprintf (%03d, $color_channels-{'blue'}), \n; } foreach my $char (@bytes) { my $byte = ord($char); my $first_nibble = int($byte / 16); my $crumbs = $byte % 16; print $first_nibble\n$crumbs\n; } print 'Data size was ', my $byte_size = @bytes, \n; ^Z GIF89a Width: 30 Height: 16 Background is 0 Color map is 0 R: 000 G: 000 B: 000 R: 128 G: 000 B: 000 ... 2 1 15 9 0 4 0 1 0 0 ... 3 11 Data size was 117 Right now, I'm sort of tracking as I read the spec. I swear to Gawd, I couldn't find anything like this last time I went a-hunting! It's not very often that you'll see me writing this much flush-left scrit, but right now I just want to follow a file through sequentially, and deal with each part as it comes. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Reduce file size with Imager
Eamon Daly wrote: Hi, all. I'm using Imager to create gifs, but the resultant file sizes are /huge/. I'm writing the files out like so: Are you doing animations? If not, skip the GIFs. You can get much better depth [16 million] in a lot less space with JPEG files. Some of the compression algorithms avaiable are loss-free, too. No matter how small the color table, each pixel is still going to take its one byte when using GIF. I see that you set a gif_eliminate_unused flag, but I am sort of sceptical about how effective this will really be. I have never heard of a GIF making such optimizations. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Reduce file size with Imager
Hi, all. I'm using Imager to create gifs, but the resultant file sizes are /huge/. I'm writing the files out like so: $img-write(type = 'gif', max_colors = 16, gif_eliminate_unused = 1, data = \$data) or die $img-errstr; I've verified that the resulting color table /is/ only 16 colors. Even so, I've opened the resulting files in several different graphic editors and saved, and those files are an order of magnitude smaller than the ones Imager produces. I've tried defining a color map and used several different variations on make_colors and translate, but the files still seem abnormally large. Is this just a limitation in libgif? Any suggestions? Eamon Daly -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
recursively finding file size/timestamp on a Mac / Solaris
Hi, This is wat I'm doing... but its not working :-( find (\wanted,$Root); print OUT ' /table pnbsp;/p /body /html'; sub wanted() { if (-d $File::Find::name) { return; } $file = $File::Find::name; $file =~ s/\//\\/g; $st = stat($file); $size = $st-size; $size = ($size/1024).KB ($size bytes); $time = scalar localtime $st-mtime; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: File size problem
Vasudev.K. wrote: Hi, I have this rather critical problem, I am trying to download quite huge files from a remote server through ftp. (The file being in a zipped format). I have an array which stores the names of the files to be downloaded. I am opening each of them up at my end and extracting data out of it and storing it in my oracle database. Q1. After unzipping, the file is huge (even the zipped one is :(( ).. almost 5GB. The system throws an errorFile too large and exits. How do I get around this ache? One way I want to do it is unzipped file into many parts and process each part separately but cant get to write a code for it. Is there any better solution? This sounds like a [rpblem with your sip application, or perhaps the limitatins of the format. Q2. Is there a way I can parallely do all these things? i.e downloading.unzippingdata extraction... database operations. Here is the script.. if somebody can help me optimize it. I'm not sure about optimixations at this point, because there are some formatting issues that make it unnecessarily difficult to follow the code. Herewith some tips on indentation: use strict; use warnings; @filenames=(A.gz,B.gz,, ...,.); #lose this, nd all other meaningless blank lines #use vlank lines only when they signal some meaningful transition in the flow of executtion. open(ZP,database_zipped_archive.dat); while (@filenames) { All code within this block should be indented by however many spaces you are using as your standard. Use spaces, not tabs. [EMAIL PROTECTED]; $ftp-get($ftpfile); $unzippedftpfile=unzipped.txt; open IN,gzip -d $ftpfile $unzippedftpfile |; close(IN); $subst=substr($_,0,2); open(ZNP,tempfile.txt) or die tempfilenot ready.f: $!\n;; while (ZNP) # This line should not be indented The control statement of the while loop is in the main flow of execution, as is the opening brace. Anyway, I am sorry that I cannot take the time to read this code for content. Please clean up the indentation so that it is not random, but rather reflects the logic of program execution. Then re-post the code. Here are the basics: Choose a number of space characters as a standard indent increm,ent. Taste vary, but choose a standard and stick with it. Any time you have an opening brace for a set of lines, this should be a signal to indent the following line by one increment. Unindent by one increment on the line that has the closing brace. The closing brace should always have a line to itself. Here is a sample usng a 4-space indent. [and one space for line continuance]. Some people prefer up to eight spaces. I personally use two. Just be consistent with whatever you choose. sub add_interior_children { my $self = shift; my ($roots, $class_system, $resolved, $unresolved, $to_close) = @_; while (my $key = shift @{$unresolved}) { my $parent = $class_system-get_parent($key); next unless $parent; if ($resolved-{$parent}) { $self-add_interior_child($key, $parent, $class_system, $resolved, $to_close); } else { push @{$unresolved}, $key; } } } When you can look at the code from arms length and tell, without reading it, how the general flow of execution goes, repost it, and you should get plenty of help with the logic. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
File size problem
Hi, I have this rather critical problem, I am trying to download quite huge files from a remote server through ftp. (The file being in a zipped format). I have an array which stores the names of the files to be downloaded. I am opening each of them up at my end and extracting data out of it and storing it in my oracle database. Q1. After unzipping, the file is huge (even the zipped one is :(( ).. almost 5GB. The system throws an errorFile too large and exits. How do I get around this ache? One way I want to do it is unzipped file into many parts and process each part separately but cant get to write a code for it. Is there any better solution? Q2. Is there a way I can parallely do all these things? i.e downloading.unzippingdata extraction... database operations. Here is the script.. if somebody can help me optimize it. @filenames=(A.gz,B.gz,, ...,.); open(ZP,database_zipped_archive.dat); while (@filenames) { [EMAIL PROTECTED]; $ftp-get($ftpfile); $unzippedftpfile=unzipped.txt; open IN,gzip -d $ftpfile $unzippedftpfile |; close(IN); $subst=substr($_,0,2); open(ZNP,tempfile.txt) or die tempfilenot ready.f: $!\n;; while (ZNP) { if ($subst=~/XXX/) { Some Operations . push(@XXX,xxx); } if ($subst=~/YYY/) { Some Operations . push(@YYY,y); } . . . . . } $filenumber++; } my $th = $db-prepare(INSERT INTO XXX_Table VALUES (?,?)); [EMAIL PROTECTED]; while (@XXX) { while ($checkorg $len) { $th-bind_param(1, $checkorg); $th-bind_param(2, @XXX-[$checkorg]); my $rc = $th-execute || die Can't execute statement: $DBI::errstr; $checkorg++; } } $checkorg=0; my $th = $db-prepare(INSERT INTO YYY_Table VALUES (?,?)); [EMAIL PROTECTED]; while (@YYY) { while ($checkorg $len) { $th-bind_param(1, $checkorg); $th-bind_param(2, @YYY-[$checkorg]); my $rc = $th-execute || die Can't execute statement: $DBI::errstr; $checkorg++; } } . . . . . Thanks In advance -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: File size problem
-Original Message- Vasudev.K. wrote: . Q1. After unzipping, the file is huge (even the zipped one is :(( ).. almost 5GB. The system throws an errorFile too large and exits. How do I get around this ache? One way I want to do it is unzipped file into many parts and process each part separately but cant get to write a code for it. Is there any better solution? . Try hjsplit.exe to split a huge file into many pieces. It is a freeware product (available on Internet) that can manage file of 10GB and over. It doesn't need installation, it is a simple executable ... Then download each piece separately... E. LOQUENDO S.p.A. Vocal Technology and Services [EMAIL PROTECTED] CONFIDENTIALITY NOTICE This message and its attachments are addressed solely to the persons above and may contain confidential information. If you have received the message in error, be informed that any use of the content hereof is prohibited. Please return it immediately to the sender and delete the message. Should you have any questions, please contact us by replying to [EMAIL PROTECTED] Thank you -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: File size problem
Ya.. I guess .. I got a part of the answer.. I am unzipping it onto the STDOUT and reading it from there But... still stuck with parallel processing :p:D -Original Message- From: Vasudev.K. [mailto:[EMAIL PROTECTED] Sent: Thursday, June 26, 2003 1:06 PM To: [EMAIL PROTECTED] Subject: File size problem Hi, I have this rather critical problem, I am trying to download quite huge files from a remote server through ftp. (The file being in a zipped format). I have an array which stores the names of the files to be downloaded. I am opening each of them up at my end and extracting data out of it and storing it in my oracle database. Q1. After unzipping, the file is huge (even the zipped one is :(( ).. almost 5GB. The system throws an errorFile too large and exits. How do I get around this ache? One way I want to do it is unzipped file into many parts and process each part separately but cant get to write a code for it. Is there any better solution? Q2. Is there a way I can parallely do all these things? i.e downloading.unzippingdata extraction... database operations. Here is the script.. if somebody can help me optimize it. @filenames=(A.gz,B.gz,, ...,.); open(ZP,database_zipped_archive.dat); while (@filenames) { [EMAIL PROTECTED]; $ftp-get($ftpfile); $unzippedftpfile=unzipped.txt; open IN,gzip -d $ftpfile $unzippedftpfile |; close(IN); $subst=substr($_,0,2); open(ZNP,tempfile.txt) or die tempfilenot ready.f: $!\n;; while (ZNP) { if ($subst=~/XXX/) { Some Operations . push(@XXX,xxx); } if ($subst=~/YYY/) { Some Operations . push(@YYY,y); } . . . . . } $filenumber++; } my $th = $db-prepare(INSERT INTO XXX_Table VALUES (?,?)); [EMAIL PROTECTED]; while (@XXX) { while ($checkorg $len) { $th-bind_param(1, $checkorg); $th-bind_param(2, @XXX-[$checkorg]); my $rc = $th-execute || die Can't execute statement: $DBI::errstr; $checkorg++; } } $checkorg=0; my $th = $db-prepare(INSERT INTO YYY_Table VALUES (?,?)); [EMAIL PROTECTED]; while (@YYY) { while ($checkorg $len) { $th-bind_param(1, $checkorg); $th-bind_param(2, @YYY-[$checkorg]); my $rc = $th-execute || die Can't execute statement: $DBI::errstr; $checkorg++; } } . . . . . Thanks In advance -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How do I find URL file size/last modified date
Hello, I have to get the size and last modified date of a remote file via URL without reading in the whole file. I have gone through LWP::UserAgent but couldn't make much headway. Any pointers on how to do it would be appreciated. TIA Shishir -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
file size
Hi, I have an upload script, and i want to check the file size before it uploads. Any suggestion is appreciated Anthony -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: file size
Upload via FTP? Via a web based form? 18 questions left... John -Original Message- From: anthony [mailto:[EMAIL PROTECTED]] Sent: 22 February 2002 14:22 To: [EMAIL PROTECTED] Subject: file size Hi, I have an upload script, and i want to check the file size before it uploads. Any suggestion is appreciated Anthony -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --Confidentiality--. This E-mail is confidential. It should not be read, copied, disclosed or used by any person other than the intended recipient. Unauthorised use, disclosure or copying by whatever medium is strictly prohibited and may be unlawful. If you have received this E-mail in error please contact the sender immediately and delete the E-mail from your system. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: file size
anthony wrote: Hi, I have an upload script, and i want to check the file size before it uploads. Any suggestion is appreciated Anthony here's some old code that does that, might be something built-in in CGI.pm as well: my $tempFile = CGI::tmpFileName($img_filename); my @file_info = stat ($tempFile); if ($#file_info 0 $file_info[7] $max_size $file_info[7] 0) { copy($tempFile, $path.$filename); } else { error_msg (Max size is $max_size); } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: file size
Anthony == awards anthony writes: Anthony Hi, I have an upload script, and i want to check the file Anthony size before it uploads. The stat() function returns a list that includes file size as the seventh element. You can use: $size = (stat($filename))[7]; ... to retrieve the size of $filename in bytes. More information at perldoc -f stat, - Chris. -- $a=printf.net; Chris Ball | chris@void.$a | www.$a | finger: chris@$a In the beginning there was nothing, which exploded. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: file size
On Feb 22, Chris Ball said: Anthony == awards anthony writes: Anthony Hi, I have an upload script, and i want to check the file Anthony size before it uploads. The stat() function returns a list that includes file size as the seventh element. You can use: $size = (stat($filename))[7]; Or the -s file test as a shortcut to this same information: $size = -s $filename; More information at perldoc -f stat, And perldoc -f -X. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ ** Look for Regular Expressions in Perl published by Manning, in 2002 ** stu what does y/// stand for? tenderpuss why, yansliterate of course. [ I'm looking for programming work. If you like my work, let me know. ] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: file size
Hi, But i tried this i didn't $size= -s $filename but it didn't work, anyways i want my upload script not to upload files that are bigger than 250Kb Anthony -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
file size limitation
hello-- i was wondering if anyone knew what the file size limitation is for perl and if there are any work arounds. i tried to compile it with the gcc flags that are necessary for LFS, but to no avail. i dont have a problem with large files under other programs. any ideas? i am running redhat - roswell and a 2.4.6 kernel. thanks, Michael Moore -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
file size
Hi, I would like to know if with a perl script you can get the size of a file ? I need to get all the size of 250 files on 250 computers ... thanx
Re: file size
-s as in: perl -e 'print $_: . -s . \n for (glob (*.*))' Hi, I would like to know if with a perl script you can get the size of a file ? I need to get all the size of 250 files on 250 computers ... thanx