Re: [fossil-users] Fossil interprets plain-text file as a binary file
On 3/27/2017 6:21 PM, Richard Hipp wrote: On 3/27/17, Ross Berteigwrote: I believe that a line is too long if it is more than about 8191 ASCII characters, a restriction based on the size of the buffer used in the diff engine. Technically, that restriction is due to the way hashes are computed on individual lines during the diff. For diffing, the file is broken up into individual lines, and every line is given a 32-bit hash that helps to speed up locating the differences. The lower 13 bits of the hash are the length of the line in bytes. The upper 19 bytes are the actual hash. Interesting. I didn't read further into the code than the definition of LENGTH_MASK and the comment that describes it in diff.c. I did wonder slightly at the name of that symbol, but it was described as the length of a line so I just ran with it. In lookslike.c we have UTF16_LENGTH_MASK which is described by the comment as being the same quantity expressed for UTF16 chars. But the comment and definition don't seem to agree. Richard, take a look at https://www.fossil-scm.org/index.html/artifact?name=3ac38fafa91d274c=220-226 Line 225 would compute UTF16_LENGTH_MASK to be 13-2-1 or 10, and get 1023 for UTF16_LENGTH_MASK. But the comment says 4096 Either the code, the comment, or I are confused here. Since I'm poking at test cases for this stuff. I'll see if I can add one that probes the UTF16 line length question. -- Ross Berteig r...@cheshireeng.com Cheshire Engineering Corp. http://www.CheshireEng.com/ +1 626 303 1602 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil interprets plain-text file as a binary file
On 3/27/2017 6:40 PM, Byron Sanchez wrote: That was it! I ran the command and received the output: Starts with UTF-8 BOM: no Starts with UTF-16 BOM: no Looks like UTF-8: no Has flag LOOK_NUL: yes Has flag LOOK_CR: no Has flag LOOK_LONE_CR: no Has flag LOOK_LF: yes Has flag LOOK_LONE_LF: yes Has flag LOOK_CRLF: no Has flag LOOK_LONG: no Has flag LOOK_INVALID: no Has flag LOOK_ODD: no Has flag LOOK_SHORT: no I deleted the null characters. I didn't have to address any of the other flags in my case, just the null characters. After that, fossil recognized the file as plain text again. Unexpected NUL characters in a field of normal text will definitely cause fossil to treat a file as binary. If you really do need to store a NUL byte in a text file (as some sort of delimiter, perhaps) fossil permits the over-long two byte UTF-8 encoding 0xC0 0x80 even though that is a technical violation of the UTF-8 specification. Allowing that particular over-long encoding is a common extension of UTF-8. The other flags just indicate that you have normal *nix line endings rather than CR LF endings used by DOS and Windows (and many many older systems) or the CR only endings used by older Macs. -- Ross Berteig r...@cheshireeng.com Cheshire Engineering Corp. http://www.CheshireEng.com/ +1 626 303 1602 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil interprets plain-text file as a binary file
That was it! I ran the command and received the output: Starts with UTF-8 BOM: no Starts with UTF-16 BOM: no Looks like UTF-8: no Has flag LOOK_NUL: yes Has flag LOOK_CR: no Has flag LOOK_LONE_CR: no Has flag LOOK_LF: yes Has flag LOOK_LONE_LF: yes Has flag LOOK_CRLF: no Has flag LOOK_LONG: no Has flag LOOK_INVALID: no Has flag LOOK_ODD: no Has flag LOOK_SHORT: no I deleted the null characters. I didn't have to address any of the other flags in my case, just the null characters. After that, fossil recognized the file as plain text again. Thank you for the help! On Mon, Mar 27, 2017 at 9:09 PM, Ross Berteigwrote: > > On 3/27/2017 5:44 PM, Byron Sanchez wrote: > >> I'm tracking several plain-text files in a repository. These are emacs >> org-mode files. >> >> Fossil sees most of the files in this repo as normal plain-text files and >> as such, they can be diffed via the fossil web interface. >> >> Recently, however, fossil has started interpreting one of these org-mode >> files as a binary file. Now, fossil prompts with it's binary-file warning >> each time I update the file. In addition, this file can no longer be diffed >> in the web interface, since fossil believes it to be a binary file. >> >> I'm wondering what steps I should take to debug this, or if there are any >> common causes for this sort of thing? Very long lines perhaps or possibly >> unicode characters? >> > > Try the command "fossil test-looks-like-utf" to see the conditions that > fossil tests for your file. That should help narrow down what to look for > in the file that caused it to suddenly smell binary. It usually decides a > file is binary if it has a line that is "too long", or has a NUL byte and > is not UTF-16. > > I believe that a line is too long if it is more than about 8191 ASCII > characters, a restriction based on the size of the buffer used in the diff > engine. > > The other thing that can happen is to accidentally save a text file in an > encoding other than UTF-8, with some character not included in the base > 7-bit ASCII set. In my experience this was usually some accented letter > from LATIN1, or a symbol such as 'µ' or '°'. Your editor will likely calmly > edit and save the file, everything looks fine, but the saved file has bytes > that make an invalid UTF-8 sequence. That does have a different warning > message than binary data (likely "invalid UTF-8") so isn't your problem > with this file. > > > The file in question is about 3.3 megabytes in size, and as far as I am >> aware, a normal plain-text org-mode file. >> >> Any ideas would be very appreciated! >> >> > -- > Ross Berteig r...@cheshireeng.com > Cheshire Engineering Corp. http://www.CheshireEng.com/ > +1 626 303 1602 > > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil interprets plain-text file as a binary file
On 3/27/17, Ross Berteigwrote: > > I believe that a line is too long if it is more than about 8191 ASCII > characters, a restriction based on the size of the buffer used in the > diff engine. Technically, that restriction is due to the way hashes are computed on individual lines during the diff. For diffing, the file is broken up into individual lines, and every line is given a 32-bit hash that helps to speed up locating the differences. The lower 13 bits of the hash are the length of the line in bytes. The upper 19 bytes are the actual hash. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil interprets plain-text file as a binary file
On 3/27/2017 5:44 PM, Byron Sanchez wrote: I'm tracking several plain-text files in a repository. These are emacs org-mode files. Fossil sees most of the files in this repo as normal plain-text files and as such, they can be diffed via the fossil web interface. Recently, however, fossil has started interpreting one of these org-mode files as a binary file. Now, fossil prompts with it's binary-file warning each time I update the file. In addition, this file can no longer be diffed in the web interface, since fossil believes it to be a binary file. I'm wondering what steps I should take to debug this, or if there are any common causes for this sort of thing? Very long lines perhaps or possibly unicode characters? Try the command "fossil test-looks-like-utf" to see the conditions that fossil tests for your file. That should help narrow down what to look for in the file that caused it to suddenly smell binary. It usually decides a file is binary if it has a line that is "too long", or has a NUL byte and is not UTF-16. I believe that a line is too long if it is more than about 8191 ASCII characters, a restriction based on the size of the buffer used in the diff engine. The other thing that can happen is to accidentally save a text file in an encoding other than UTF-8, with some character not included in the base 7-bit ASCII set. In my experience this was usually some accented letter from LATIN1, or a symbol such as 'µ' or '°'. Your editor will likely calmly edit and save the file, everything looks fine, but the saved file has bytes that make an invalid UTF-8 sequence. That does have a different warning message than binary data (likely "invalid UTF-8") so isn't your problem with this file. The file in question is about 3.3 megabytes in size, and as far as I am aware, a normal plain-text org-mode file. Any ideas would be very appreciated! -- Ross Berteig r...@cheshireeng.com Cheshire Engineering Corp. http://www.CheshireEng.com/ +1 626 303 1602 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Fossil interprets plain-text file as a binary file
On Mar 27, 2017 6:44 PM, "Byron Sanchez"wrote: Recently, however, fossil has started interpreting one of these org-mode files as a binary file. Now, fossil prompts with it's binary-file warning each time I update the file. In addition, this file can no longer be diffed in the web interface, since fossil believes it to be a binary file. I'm wondering what steps I should take to debug this, or if there are any common causes for this sort of thing? Very long lines perhaps or possibly unicode characters? Long lines, invalid unicode sequences, or many control codes. What type of data is it? Source code, poetry? ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Fossil interprets plain-text file as a binary file
I'm tracking several plain-text files in a repository. These are emacs org-mode files. Fossil sees most of the files in this repo as normal plain-text files and as such, they can be diffed via the fossil web interface. Recently, however, fossil has started interpreting one of these org-mode files as a binary file. Now, fossil prompts with it's binary-file warning each time I update the file. In addition, this file can no longer be diffed in the web interface, since fossil believes it to be a binary file. I'm wondering what steps I should take to debug this, or if there are any common causes for this sort of thing? Very long lines perhaps or possibly unicode characters? The file in question is about 3.3 megabytes in size, and as far as I am aware, a normal plain-text org-mode file. Any ideas would be very appreciated! Thanks, Byron Sanchez ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users