File store approaches 3tb now - just about 290gb free on a 3.1tb partition.
The concern is that I've noticed a fair number of ISO files (and potentially a lot of other files including zip and other archives, and mpegs, etc.) that seem to be duplicates of each other. I want to generate a report for the VP of engineering, and let him know how bad the situation is - I'm going to guess there's close to 1tb of redundancy currently. Yes, this will consume hours of time, but I can launch it over a weekend and take a look on the Monday following. I like your idea of restartability, though - it's worth looking at as a secondary goal. Kurt On Thu, Jul 30, 2015 at 2:19 PM, James Button <jamesbut...@blueyonder.co.uk> wrote: > Not experienced enough in powershell to suggest code, > BUT > I would advise that you make the process run as a restartable facility such > that the process can be interrupted ( if not by escape or ctrl+c, then by > task killing) and then, when restarted will continue processing a list of > files from the one after the last one for which a result was recorded. > > Working on the basis that you have a 1TB file store, and are working towards > a 3, or 6TB filestore, even assuming your filestore connection runs at > 8Gb/sec, as in 60GB per minute, that's surely going to be an hours full time > use of the interface, and I'd really expect the hashing process to take > getting on for a day elapsed if the system is running - spinning media on a > more common interface connection, rather than a solid state store on the > fastest possible multi-channel interface. > > You may also need to consider the system overhead in assembling the list of > files - sheer volume of the MFT to be processed, > I know from a fair amount of the restructuring work I used to do for clients > on a 4GB memory system with caddy'd drives - > Such as renaming files that filled a 1TB drive, for access as 'home drives' - > before you had all the maintenance goodies in the admin facilities. > > (Having taken a complete list of files, stuck them in Excel, sorted them > there, and generated a set of rename commands.) > > It took more time processing the MFT entries to "rename" the files in situ - > than it did to copy them to another drive with the new names. > Simply because of the thrashing on the MFT blocks in the OS allocated disk > read cache. > > JimB > > > -----Original Message----- > From: listsadmin@lists.myITforum.com [mailto:listsadmin@lists.myITforum.com] > On Behalf Of Kurt Buff > Sent: Thursday, July 30, 2015 8:45 PM > To: powersh...@lists.myitforum.com > Subject: [powershell] Need some pointers on an exercise I've set for myself > > I'm putting together what should be a simple little script, and failing. > > I am ultimately looking to run this against a directory, then sort the > output on the hash field and then parse for duplicates. There are two > conditions that concern me: 1) there are over 3m files in the target > directory, and 2) many of the files are quite large, over 1g. > > I'm more concerned about the effects of the script on memory than on > processor - the data is fairly static, and I intend to run it once a > month or even less, but I did choose MD5 as the hash algorithm for > speed, rather than accept the default of SHA256. > > This is pretty simple stuff, I'm sure, but I'm using this as a > learning exercise more than anything, as there are duplicate file > finders out in the world already. > > There are several problems with what I have put together so far, which > this this: > > Get-ChildItem c:\stuff -Recurse | select length, fullname | > export-csv -NoTypeInformation c:\temp\files.csv > Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash > -algorithm md5 $_.FullName) }; Length | Sort hash > > Using Length (or $_.Length) anywhere in the foreach statement gives an > error, or gives weird output. > > Sample Output when not using Length, and therefore getting reasonable > output (extra spaces and hyphen delimiters elided): > Algorithm Hash > Path > MD5 592BE1AD0ED83C36D5E68CA7A014A510 > C:\stuff\Tools\SomeFile.DOC > > What I'd like to see instead > Hash Length > Path > 592BE1AD0ED83C36D5E68CA7A014A510 79872 C:\stuff\Tools\SomeFile.DOC > > If anyone can offer some instruction, I'd appreciate it. > > Kurt > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 > > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 > ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1