On Apr 1, 4:59 pm, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote: > On Tue, 31 Mar 2009 14:26:08 -0700 (PDT), ritu > <ritu_bhandar...@yahoo.com> declaimed the following in > gmane.comp.python.general: > > > > > if ( ( -B $filename || > > $filename =~ /\.pdf$/ ) && > > -s $filename > 0 ) { > > return(1); > > } > > According to my old copy of the Camel, -B only reads the "first > block" of the file. If the block contains a <NUL>, or if ~30% of the > block contains bytes >127 or from some (undefined) set of control > characters (that is, I expect it does not count <LF>, <CR>, <TAB>, <VT>, > <FF>, maybe some others)... So...
Not sure whether this is meant to be rough pseudocode or an April 1 "jeu d'esprit" or ... > > def isbin(fid): > fin = open(fid, "r") (1) mode = "rb" might be better > block = fin.read(1024) #what is the size of a "block" these days > binary = "\0" in block > if not binary: > mrkrs = [b for b in block > if b > 127 (2) [assuming Python 2.x] b is a str object; change 127 to "\x3f" > or b in [ "\r", "\n", "\t" ] > ] #add needed (3) surely you mean "b not in" (4) possible improvements on ["\r", etc etc] : (4a) use tuple ("\r", etc etc) (4b) use string "\r\n\t" (you don't really want to build that list from scratch for each byte tested, do you?) > binary = (float(len(mrkrs)) / len(block)) > 0.30 > fin.close() > return binary Cheers, John -- http://mail.python.org/mailman/listinfo/python-list