I had a seemingly normal python file checked in and when I went to view the artifact, fossil claimed it was binary. I tracked it down to a line like this in my source:
sys.stdout.write('^H') Where the ^H is 0x8, the backspace character. It seems my otherwise text looking file was marked binary by mimetype_from_content() in src/doc.c I know text/binary detection can be a tricky problem with UTF-8/16/32, so I'm wondering if the isBinary is really the best solution. I locally patched my fossil to look for a NULL, and that seems to work pretty well. Embedding a NULL in source code probably happens less often than some of the other non-printable characters. And I would suspect your average binary file has a few NULLs in it. Should that check stay as is, become a NULL based check (patch attached), or maybe some sort of probability based metric (i.e. <5% of the file is non-printable, still not great for UTF-* if we don't have a unicode table for printable characters handy). Your thoughts? Bill _______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users