On Fri, Dec 30, 2011 at 11:58 AM, Igor Dovgiy <ivd.pri...@gmail.com> wrote:

> Hi John, yes, good point! Totally forgot this. ) Adding new files to a
> directory as you browse it is just not right, of course. Possible, but not
> right. )
>
> I'd solve this by using hash with filenames as keys and collected 'result'
> strings (with md5 and filesizes) as values, filled by File::Find target
> routine.
> After the whole directory is processed, this hash should be 'written out'
> into the target directory.
>
> Another way to do it is to collect all the filenames instead into a list
> (with glob operator, for example), and process this list after.
>
> BTW (to Jonathan), I wonder do you really need to store this kind of data
> in different files? No offence... but I can hardly imagine how this data
> will be used later unless gathered into some array or hash. )
>
> -- iD
>
> 2011/12/30 John W. Krahn <jwkr...@shaw.ca>
>
> > Jonathan Harris wrote:
> >
> >>
> >>>  Hi John
> >>
> >> Thanks for your 2 cents
> >>
> >> I hadn't considered that the module wouldn't be portable
> >>
> >
> > That is not what I was implying.  I was saying that when you add new
> files
> > to a directory that you are traversing you _may_ get irregular results.
>  It
> > depends on how your operating system updates directory entries.
> >
> >
>


Hi All

John -
Thanks for the clarification
In this instance the script has been run on OSX - it seems that adding the
files into the directory that is being traversed works ok this time

However for best practice, I would certainly look into writing to a
separate directory, and then moving the files back, as I appreciate that
this fortune may not necessarily be repeated in a different environment!

Igor -
Firstly -  File::Spec
Thanks for your insight and well explained investigation - I have been
learning a lot from this
File::Spec has proven a most useful tool in joining and 'stringifying' the
paths

In the original post about this script, I had spoken about considering
using a hash for the file data
I'm still convinced that ultimately, this would be the way forwards

I have found some scripts online concerning finding duplicate files
They use md5 and/or file sizes to compare the files
These are written into hashes
Fully understanding some of these scripts is a little beyond my level at
the moment

I have attached an interesting one for you to look at (you may be aware of
it already!)
However, it has proved quite inspiring!

>> (substr($line, 0, 1) eq '.')

Haven't learned this yet!
It looks like a good solution if it is so much more efficient - thanks for
the introduction - I'll be reading up asap!

>>BTW (to Jonathan), I wonder do you really need to store this kind of data
>>in different files? No offence... but I can hardly imagine how this data
>>will be used later unless gathered into some array or hash. )

There is a good reason for this!
Talking to guys who work in video on demand, it seems that it is standard
practice to do this for file delivery requirements
As each video file must be identical upon receipt as it was upon delivery (
and that the files are all treated as unique delivery instances )
a separate accompanying file is required
I thought that Perl would be a good choice for accomplishing this
requirement as it is renowned for file handling

#####

Thanks to everyone for your help and contributions - particularly Jim,
Shlomi, John and Igor
I have learned crazy amounts already!

Happy New Year to you all!

Jonathan

Attachment: finddupes3.plx
Description: Binary data

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Reply via email to