Might be a silly question but if you found a file which wasn't valid, what you 
going to do?

Why wouldn't a file be valid?  

> On 17 Aug 2016, at 21:56, Kevin J Cully <[email protected]> wrote:
> 
> The thought was to keep an MD5 of each file (or similar), and if that changes 
> then trigger the actual validation.  First run would be intense, but most 
> files don't change much.  Perhaps ever.
> 
> It's funny you mention LibreOffice, because a suggestion I received was to 
> use the command line tool 'soffice.exe' which is part of LibreOffice to check 
> the office documents.  Basically if soffice can turn it into a pdf, (to be 
> deleted afterward) then the file would be considered a 'valid' file.  
> 
> In regards to images, the ImageMagick tool of "identify" would produce the 
> meta data from image files. Also in the running to enter the test phase of 
> this project.  http://www.imagemagick.org/script/identify.php
> 
> In the case of a Word document with a macro virus, hopefully (fingers 
> crossed!) the malware scan would find it as soon as it was saved. If we're 
> using LibreOffice, we'd hopefully have the option to disable macros when 
> (test) converting it to PDF.
> 
> This definitely would be interesting. I hope I get the green light to work on 
> it.
> 
> "Most useful complex projects begin their lives as useful simple projects."
> 
> -Kevin
> 
> -----Original Message-----
> From: ProFox [mailto:[email protected]] On Behalf Of Ted Roche
> Sent: Wednesday, August 17, 2016 2:29 PM
> To: [email protected]
> Subject: Re: Common File Document Validation
> 
> That's a great question!
> 
> Obviously, since the post's subject didn't include "[NF]" you've already 
> found your solution -- FoxPro! *wink*
> 
> I've done some document management systems in VFP, and the recursion, 
> cataloging and checksums is easy, relatively-speaking. But the validation is 
> an interesting twist, and a much more difficult problem.
> 
> Triggering the checking is also an interesting feature. Doing a bulk rescan 
> would be slow and intensive, though you could tune it to not consume 
> excessive resources, at a cost of slower checking.
> 
> Windows File Systems have some advanced features in the newer servers that 
> would let you hook into a file system event (adding a new file or saving over 
> an old one) to trigger your validation routine. If WinFS had ever been 
> released, (https://en.wikipedia.org/wiki/WinFS) that would have been perfect, 
> but alas, it was another empty vaporware promise of "The Old Microsoft." 
> However some of "Longhorn" did end up in DotNet, like:
> 
> https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.changed(v=vs.110).aspx
> 
> A simpler solution might be a "Document Management System" but implementing 
> one of these is a tough challenge in technology, politics, and technical 
> support.
> 
> "Validity" is a bit nebulous. How are you defining that?
> 
> I mean, there are Word95 documents I can't open in Word2007, but can in 
> LibreOffice. And is a Word document with a macro virus valid? How many 
> versions and variations to support? How to handle password-encrypted or 
> restricted files?
> 
> VFP would be a great tool for doing the validation, where you can use 
> low-level file functions to read headers and calculate checksums, but complex 
> structured documents, like MS's Compound OLE Documents, and MS's ZIP-encoded 
> XML and JSON DocX documents, get a lot trickier.
> There's typically a "magic" signature at the beginning of most files that 
> will tell you it's type, but whether all the contents have integrity is a lot 
> tougher to determine. I suspect each format would need to be reviewed to 
> determine if there were internal consistency checks that would tell you of 
> corruption or truncation.
> 
> Sounds like an interesting project, though. Will be interested to hear if you 
> find a suitable package, or DIY it.
> 
> --
> Ted Roche
> Ted Roche & Associates, LLC
> http://www.tedroche.com
> 
[excessive quoting removed by server]

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/[email protected]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to