Hi All

Firstly, many thanks for your help previously (19/12/11) - it has led to
making a useable script
I don't think it's brilliantly written, it seems a little bodged together
to me....... but works fine - not a bad result for a first script

If you are new to this problem and are interested in the previous thread, I
have attached it for you as a text file

I have done everything I can think of now to follow the previous advice
The script is portable, skips directories, creates digests of files, uses
better Perl practice (e.g., no more barewords, correct lexical file handles
etc)

I only have a couple of questions left - wondering if you can help :

- One thing that was recommended was to ensure that the File Handles are
opened outside of the loop
- I really can't figure out how to do this and keep the program working!
Doesn't it need to be inside the loop to be iterated over?

- Also, I had to open two file handles to get addfile()->hexdigest to work
so that the value could be passed - this can't be correct?!

- Writing back was messy - I was struggling with concatenating variables to
keep the script portable
- Is it possible to put values into a hash, and then print each hash entry
to a separate file?


There are clearly better ways to achieve this result - all suggestions are
gratefully received!

Thanks again

Jon


Here's the script:

#############

# This program reads in files from a directory, produces a hex digest and
writes the hex, along with
# the file size into a newly created file with the same name and a '.md5'
extension, to the original directory

#!/usr/bin/perl
# md5-test.plx
use warnings;
use strict;

use File::Find;
use Digest::MD5;
use File::Spec;

my $dir = shift || '/Users/jonharris/Documents/begperl';
my ($dircontents, $path, @directory, $fh, $wr_fh);

@directory = $dir;
$path = File::Spec->catfile( @directory, $dircontents );

opendir (my $in, $path) or die "Cannot open $dir: $!\n";
find (\&wanted, $path);
close $in;

opendir (my $totalin, $path) or die "Cannot open $dir: $!\n";
find (\&cleanup, $path);
close $totalin;

print "\n\nAll Done!\n\n";

sub wanted {
 while ($dircontents = readdir($in)) {
if ($dircontents=~/^\./ || -d $dircontents) {
next;
}
my $bytes = -s $dircontents;
print $dircontents, "\n";
print $bytes, " bytes", "\tSo far so good!\n";
 open $fh, "<", "$dircontents" or die $!;
open $wr_fh, "+>", "$path $dircontents.md5" or die $!;  ## was unable to
concatenate here, hence sub cleanup to remove the ' '

my $hex = Digest::MD5->new->addfile($fh)->hex digest;
print "Hex Digest: ", $hex, "\n\n";
print $wr_fh $hex, "\n", $bytes, "\n\n";
return($hex);

close $wr_fh;
close $fh;
}
}

# The following is mostly not original code - thanks to the author!
sub cleanup {
my @filelist = readdir($totalin);
foreach my $oldname (@filelist) {
next if -d $oldname;
my $newname = $oldname;
$newname =~ s/\ //;
rename $oldname, $newname;
}
}

##### End #####
Jonathan Harris
Dec 19 (10 days ago)

to beginners, me 
Hi Perl Pros

This is my first call for help

I am a totally new, self teaching, Perl hopeful

If my approach to this script is simply wrong, please let me know as it will 
help my learning!

The script aims to:

1) Read in a directory either from the command line, or from a default path
2) Produce a hash for future checksum
3) Write this (hex digest) to a separate file, in a sub directory of the 
parent, which has the same name and a .md5 extension
4) Check the original file for its size
5) Add this data to the newly created file on a new line (in bytes)

I have a script that will do most of this, except for analysing the file size - 
I think that the file size being analysed may be the md5 object result as the 
same value is printed to each file

I am running out of ideas and would appreciate any help you could give!
I have tried using File::stat::OO and File::stat  - but to no avail - I could 
be using them incorrectly!

Many thanks in advance...

Jon.

Here are some details:

System:
Mac OSX 10.7.2
Perl version 5.12

Script:

#!/usr/bin/perl
# md5-test-3.plx
use warnings;
use strict;
use Digest::MD5;

my $filesize = 0;
my $dir = shift || '/Users/jonharris/Documents/begperl';
opendir (DH, $dir) or die "Couldn't open directory: $!";

my $md5 = Digest::MD5->new;
while ($_ = readdir(DH)) {
        $md5->add($_);
        $filesize = (stat(DH))[7];
        
#### Is it necessary to put the following into a new loop?

        foreach ($_) {
                open FH, ">> /Users/jonharris/Documents/begperl/md5/$_.md5" or 
die $!;
                binmode(FH);
                print FH $md5->hexdigest, "\n", $filesize;
        }
        close FH;
}
close DH;
print "\n$dir\n\n";

#######

Jim Gibson via perl.org 
Dec 19 (10 days ago)

to beginners 
On 12/19/11 Mon  Dec 19, 2011  11:32 AM, "Jonathan Harris"
<jtnhar...@googlemail.com> scribbled:

> Hi Perl Pros
>
> This is my first call for help
>
> I am a totally new, self teaching, Perl hopeful
>
> If my approach to this script is simply wrong, please let me know as it
> will help my learning!
>
> The script aims to:
>
> 1) Read in a directory either from the command line, or from a default path
> 2) Produce a hash for future checksum

A "hash" of what? File names (directory contents) or file contents?

> 3) Write this (hex digest) to a separate file, in a sub directory of the
> parent, which has the same name and a .md5 extension

Same name as the file or same name as the directory?

> 4) Check the original file for its size
> 5) Add this data to the newly created file on a new line (in bytes)

Will this file contain information for one file or many files?

>
> I have a script that will do most of this, except for analysing the file
> size - I think that the file size being analysed may be the md5 object
> result as the same value is printed to each file

Print out the file size returned by stat. Check if it is the same displayed
by the ls command.

>
> I am running out of ideas and would appreciate any help you could give!
> I have tried using File::stat::OO and File::stat  - but to no avail - I
> could be using them incorrectly!

I am afraid I do not understand exactly what you are trying to accomplish. I
can't tell from your program whether or not you will end up with one digest
file for the entire directory, or one digest file for each file in the
directory.

>
> Many thanks in advance...
>
> Jon.
>
> Here are some details:
>
> System:
> Mac OSX 10.7.2
> Perl version 5.12
>
> Script:
>
> #!/usr/bin/perl
> # md5-test-3.plx
> use warnings;
> use strict;
> use Digest::MD5;
>
> my $filesize = 0;

You should declare variables where they are needed and not before.

> my $dir = shift || '/Users/jonharris/Documents/begperl';
> opendir (DH, $dir) or die "Couldn't open directory: $!";
>
> my $md5 = Digest::MD5->new;
> while ($_ = readdir(DH)) {

You are better off using a scalar variable and not the default variable,
which can get reused and overwritten:

while( my $file = readdir(DH) ) {

> $md5->add($_);

You are adding file names to a string to be "digested". Is that what you
want? Or do you want to calculate a digest for the contents of each file?

I have not used Digest::MD5, but if you to calculate the digest for the
contents of each file, you want to create a new digest object, open the
file, and use the addfile() method, then hexdigest() for each file.

> $filesize = (stat(DH))[7];

You are applying stat to the directory read handle. You want to fetch data
for the file (untested):

my $filesize = (stat("$dir/$file"))[7];

Note that you must prefix the file name with its full path.

> #### Is it necessary to put the following into a new loop?

No. It makes no sense to have a one-iteration loop.

>
> foreach ($_) {
> open FH, ">> /Users/jonharris/Documents/begperl/md5/$_.md5" or die $!;

You are appending lines to a file with a name that is based on an existing
file. Why?

> binmode(FH);

There is no need to set the mode of the output file to binary. Both
hexdigest and the file size will be written in ascii characters, not binary
data.


Shlomi Fish shlo...@shlomifish.org
Dec 19 (10 days ago)

to me, beginners, me 
Hi Jonathan,

some comments on your code - both positive and negative.
strict and warnings are good.

> use Digest::MD5;

So is using a module.

>
> my $filesize = 0;

You shouldn't predeclare your variables.

> my $dir = shift || '/Users/jonharris/Documents/begperl';
> opendir (DH, $dir) or die "Couldn't open directory: $!";

Don't use bareword dir handles - use lexical ones. It's good that you're using
the "or die" thing, though.

>
> my $md5 = Digest::MD5->new;

Seems like you're using the same $md5 object times and again which will
calculate cumulative MD5 sums instead of per-file ones.

> while ($_ = readdir(DH)) {

1. You're not skipping "." and "..".

2. You're not skipping other directories.

3. The $_ variable can be easily devastated. You should use a lexical one.

> $md5->add($_);

According to http://metacpan.org/module/Digest::MD5 the "add()" methods adds
data, and here it will only add the filename. You need to use addfile() with an
opened file handle instead.

> $filesize = (stat(DH))[7];

You shouldn't stat the directory handle. Instead stat "$dir/$filename" (you can
also use the core File::Spec module if you want to make it extra portable).

> #### Is it necessary to put the following into a new loop?
>
> foreach ($_) {

Why the foreach ($_) here? It does nothing. You're already iterating on the
files.

> open FH, ">> /Users/jonharris/Documents/begperl/md5/$_.md5" or die $!;
> binmode(FH);
> print FH $md5->hexdigest, "\n", $filesize;
> }
> close FH;


Use lexical file handles here, use three-args open - «open my $fh, '>>',
'/Users....'» and don't open it time and again. Keep it outside the loop.

For more information see:

http://perl-begin.org/tutorials/bad-elements/

Regards,

       Shlomi Fish


Jonathan Harris
Dec 19 (10 days ago)

to Jim, beginners 

Hi Jim

Thanks for responding

Here are some answers to your questions:

> A "hash" of what? File names (directory contents) or file contents? 

The aim is to produce a digest for each file in the folder

> Same name as the file or same name as the directory?

Each file created will have the same filename as the original, but with the 
extension '.md5'

> Will this file contain information for one file or many files?

This file will contain information for only one file
This is so that when looking at the treated directory, the end result will be:

some_movie.mov
some_movie.mov.md5
another_movie.mov
another_movie.mov.md5

etc...

> Print out the file size returned by stat. Check if it is the same displayed
   by the ls command.

I really have! However, this is where I am coming unstuck

> I am afraid I do not understand exactly what you are trying to accomplish. I
   can't tell from your program whether or not you will end up with one digest
   file for the entire directory, or one digest file for each file in the
   directory.

The program creates one digest file for each file in the directory : )

> my $filesize = 0;
> You should declare variables where they are needed and not before.

Noted


> my $md5 = Digest::MD5->new;
> while ($_ = readdir(DH)) {

> You are better off using a scalar variable and not the default variable, 
> which can get reused and overwritten:

Noted

> $filesize = (stat(DH))[7];

> You are applying stat to the directory read handle. You want to fetch data
> for the file (untested):

> my $filesize = (stat("$dir/$file"))[7];

This could well be where it's going wrong

> open FH, ">> /Users/jonharris/Documents/begperl/md5/$_.md5" or die $!;

> You are appending lines to a file with a name that is based on an existing
> file. Why?

This is what was requested in the original brief - to have the digest file with 
the same filename as the original, just with the extension '.md5'

> binmode(FH);

> There is no need to set the mode of the output file to binary. Both
> hexdigest and the file size will be written in ascii characters, not binary
> data.

I did this as it was part of the example code on the Digest::MD5 page on CPAN
If it's not necessary, I'll drop it from the script

Thanks for your help, Jim.
I'm going to have a look over the script and try to produce something new

Bests

Jon

Jonathan Harris
Dec 19 (10 days ago)

to beginners 

Hi Shlomi

Thanks for your response

To answer your questions:

> my $filesize = 0;

>You shouldn't predeclare your variables 

Noted, thanks

> my $dir = shift || '/Users/jonharris/Documents/begperl';
> opendir (DH, $dir) or die "Couldn't open directory: $!";

> Don't use bareword dir handles - use lexical ones. It's good that you're using
> the "or die" thing, though.

Of course - will have to re-read the section on barewords...

> my $md5 = Digest::MD5->new;

> Seems like you're using the same $md5 object times and again which will
> calculate cumulative MD5 sums instead of per-file ones.

In which case, I think that it will have to go in the loop so that a new 
instance is produced each time?

> while ($_ = readdir(DH)) {

> 1. You're not skipping "." and "..".
> 2. You're not skipping other directories.

I'm sure I read somewhere, that the parent was automatically skipped
Must be getting confused
However, adding this will be ok:

next if $_ eq "." or $_ eq "..";

> 3. The $_ variable can be easily devastated. You should use a lexical one.

I'll certainly use lexical in the future


> $md5->add($_);

> According to http://metacpan.org/module/Digest::MD5 the "add()" methods adds
> data, and here it will only add the filename. You need to use addfile() with 
> an
> opened file handle instead.

That makes sense

> $filesize = (stat(DH))[7];

> You shouldn't stat the directory handle. Instead stat "$dir/$filename" (you 
> can
> also use the core File::Spec module if you want to make it extra portable).

This is where I have been tying my head in knots
I'll give that a try
Although, I do remember trying and getting an error:

use of uninitialised value at line x

Probably because the code prior to it was incorrect

> foreach ($_) {

> Why the foreach ($_) here? It does nothing. You're already iterating on the
> files.

Thought as much - I tried it both ways and it didn't seem necessary - as you 
said, I'm already iterating over the files


> Use lexical file handles here, use three-args open - «open my $fh, '>>',
> '/Users....'» and don't open it time and again. Keep it outside the loop.

> For more information see:

http://perl-begin.org/tutorials/bad-elements/

Thanks for all your help

I'll have a read-up a rethink and a re-write!

All the yes
-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Reply via email to