Re: check if file is MS Word or PDF file

Michael Crute Sat, 27 Sep 2008 17:02:23 -0700

On Sat, Sep 27, 2008 at 7:01 PM, Chris Rebert <[EMAIL PROTECTED]> wrote:
> Looking at the docs for the mimetypes module, it just guesses based on
> the filename (and extension), not the actual contents of the file, so
> it doesn't really help the OP, who wants to make sure their program
> isn't misled by an inaccurate extension.


One other way to detect a pdf is to just read the first 4 bytes from
the file. Valid pdf files start with "%PDF-". Something similar can be
done with Word docs but I don't know what the magic bytes are. This
approach is pretty similar to what the file command does but is
probably a better approach if you have to support multiple platforms.

-mike

-- 
________________________________
Michael E. Crute
http://mike.crute.org

God put me on this earth to accomplish a certain number of things.
Right now I am so far behind that I will never die. --Bill Watterson
--
http://mail.python.org/mailman/listinfo/python-list

Re: check if file is MS Word or PDF file

Reply via email to