On Fri, Dec 30, 2011 at 11:58 AM, Igor Dovgiy <ivd.pri...@gmail.com> wrote:
> Hi John, yes, good point! Totally forgot this. ) Adding new files to a > directory as you browse it is just not right, of course. Possible, but not > right. ) > > I'd solve this by using hash with filenames as keys and collected 'result' > strings (with md5 and filesizes) as values, filled by File::Find target > routine. > After the whole directory is processed, this hash should be 'written out' > into the target directory. > > Another way to do it is to collect all the filenames instead into a list > (with glob operator, for example), and process this list after. > > BTW (to Jonathan), I wonder do you really need to store this kind of data > in different files? No offence... but I can hardly imagine how this data > will be used later unless gathered into some array or hash. ) > > -- iD > > 2011/12/30 John W. Krahn <jwkr...@shaw.ca> > > > Jonathan Harris wrote: > > > >> > >>> Hi John > >> > >> Thanks for your 2 cents > >> > >> I hadn't considered that the module wouldn't be portable > >> > > > > That is not what I was implying. I was saying that when you add new > files > > to a directory that you are traversing you _may_ get irregular results. > It > > depends on how your operating system updates directory entries. > > > > > Hi All John - Thanks for the clarification In this instance the script has been run on OSX - it seems that adding the files into the directory that is being traversed works ok this time However for best practice, I would certainly look into writing to a separate directory, and then moving the files back, as I appreciate that this fortune may not necessarily be repeated in a different environment! Igor - Firstly - File::Spec Thanks for your insight and well explained investigation - I have been learning a lot from this File::Spec has proven a most useful tool in joining and 'stringifying' the paths In the original post about this script, I had spoken about considering using a hash for the file data I'm still convinced that ultimately, this would be the way forwards I have found some scripts online concerning finding duplicate files They use md5 and/or file sizes to compare the files These are written into hashes Fully understanding some of these scripts is a little beyond my level at the moment I have attached an interesting one for you to look at (you may be aware of it already!) However, it has proved quite inspiring! >> (substr($line, 0, 1) eq '.') Haven't learned this yet! It looks like a good solution if it is so much more efficient - thanks for the introduction - I'll be reading up asap! >>BTW (to Jonathan), I wonder do you really need to store this kind of data >>in different files? No offence... but I can hardly imagine how this data >>will be used later unless gathered into some array or hash. ) There is a good reason for this! Talking to guys who work in video on demand, it seems that it is standard practice to do this for file delivery requirements As each video file must be identical upon receipt as it was upon delivery ( and that the files are all treated as unique delivery instances ) a separate accompanying file is required I thought that Perl would be a good choice for accomplishing this requirement as it is renowned for file handling ##### Thanks to everyone for your help and contributions - particularly Jim, Shlomi, John and Igor I have learned crazy amounts already! Happy New Year to you all! Jonathan
finddupes3.plx
Description: Binary data
-- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/