[fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
Hi all, I've tried to use Fossil on russian version of Windows 7. I made commit with russian text in comment, when I run the UI and look at timeline, I saw that russian text looks like squares. Is any possibility to setup the codeset of commits? It would be nice if I can autotranslate them into UTF-8 while comminting. If there is no such ability, can I change the text of commits (into english, for example). -- Ruslan Popov phone: +7 916 926 1205 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
Ruslan Popov wrote: > I've tried to use Fossil on russian version of Windows 7. I made commit with > russian text in comment, when I run the UI and look at timeline, I saw that > russian text looks like squares. Why don't just use text editor that supports UTF-8 and write your comments in UTF-8 instead of cp1251? You can set/change default text editor with "fossil set editor anything-else-than-notepad" (I am using Notepad2, for example). Question to Richard Hipp: can you please add any UTF-8 character(s) to the following text to help text editors to auto-detect the right encoding? # Enter comments on this check-in. Lines beginning with # are ignored. # The check-in comment follows wiki formatting rules. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] May fossil push via ssh?
Hi, May fossil push via ssh? -- Ruslan Popov phone: +7 916 926 1205 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
Sergey, now I use emacs and its mule-utf-8-unix encoding for commit buffer. On Fri, Jun 25, 2010 at 2:15 PM, Sergey Sfeli wrote: > Ruslan Popov wrote: > > > I've tried to use Fossil on russian version of Windows 7. I made commit > with > > russian text in comment, when I run the UI and look at timeline, I saw > that > > russian text looks like squares. > > Why don't just use text editor that supports UTF-8 and write your > comments in UTF-8 instead of cp1251? You can set/change default text > editor with "fossil set editor anything-else-than-notepad" (I am using > Notepad2, for example). > > > Question to Richard Hipp: can you please add any UTF-8 character(s) to the > following text to help text editors to auto-detect the right encoding? > > # Enter comments on this check-in. Lines beginning with # are ignored. > # The check-in comment follows wiki formatting rules. > > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > -- Ruslan Popov phone: +7 916 926 1205 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 12:15, Sergey Sfeli wrote: > Ruslan Popov wrote: > >> I've tried to use Fossil on russian version of Windows 7. I made commit with >> russian text in comment, when I run the UI and look at timeline, I saw that >> russian text looks like squares. > > Why don't just use text editor that supports UTF-8 and write your > comments in UTF-8 instead of cp1251? You can set/change default text > editor with "fossil set editor anything-else-than-notepad" (I am using > Notepad2, for example). > > > Question to Richard Hipp: can you please add any UTF-8 character(s) to the > following text to help text editors to auto-detect the right encoding? > > # Enter comments on this check-in. Lines beginning with # are ignored. > # The check-in comment follows wiki formatting rules. > Perhaps fossil should have a "system encoding" which it would get from the environment (locales, windows codepage) and mark all commit messages with it. This should mark the commits with the correct encoding on most unix-like systems. On windows there is a "DOS codepage" and a "Windows codepage" so there is no completely reliable way of determining the encoding used on the system although the "Windows codepage" would be what most windowed programs use. Still there should be a possibility to set the encoding explicitly. It is somewhat open question what to do when displaying timeline of a repository with commits in multiple encodings, though. Thanks Michal ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 21:34, Michal Suchanek wrote: > Perhaps fossil should have a "system encoding" which it would get from > the environment (locales, windows codepage) and mark all commit > messages with it. > I vote that this is an extraordinarily bad idea. Fossil is a *distributed* SCM system. Potentially the distributed database in question could be spread around the world. Do you really want the nightmare (and impossibility!) of trying to keep track of which project is in which encoding scheme on which machine? UTF-8 is a standard *explicitly*designed to *stop* this kind of confusion. It's also been around since 1993, so your development tools have had plenty of time to catch on and actually use it. -- "Perhaps people don't believe this, but throughout all of the discussions of entering China our focus has really been what's best for the Chinese people. It's not been about our revenue or profit or whatnot." --Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 11:15, Sergey Sfeli wrote: > > Ruslan Popov wrote: > > > I've tried to use Fossil on russian version of Windows 7. I made commit with > > russian text in comment, when I run the UI and look at timeline, I saw that > > russian text looks like squares. > > Why don't just use text editor that supports UTF-8 and write your > comments in UTF-8 instead of cp1251? You can set/change default text > editor with "fossil set editor anything-else-than-notepad" (I am using > Notepad2, for example). > > > Question to Richard Hipp: can you please add any UTF-8 character(s) to the > following text to help text editors to auto-detect the right encoding? > > # Enter comments on this check-in. Lines beginning with # are ignored. > # The check-in comment follows wiki formatting rules. > > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users The correct character to add would be U+FEFF (UTF-8 "\xEF\xBB\xBF") "ZERO-WIDTH NON-BREAKING SPACE", commonly called the byte order mark, and which may be familiar for showing up as "" in badly encoded documents. For sensible programs, this should be a dead give away to switch into UTF-8 mode; chief among these programs Notepad. For programs which assume UTF-8 but don't detect based upon BOMs it should be an invisible character. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
The trouble is that UTF-8 is a poor standard. It bloats many texts, is quite expensive to parse, and has only one redeeming feature: It never creates embedded nulls. I suppose that it shares its encoding with ASCII is a feature too, but only a minor one. Personally, I think that most systems should adopt SCSU as their storage encoding, but that's unlikely to happen until C strings and MIME (two paragons of awfulness) die out. On 25 June 2010 16:00, Michael Richter wrote: > On 25 June 2010 21:34, Michal Suchanek wrote: >> >> Perhaps fossil should have a "system encoding" which it would get from >> the environment (locales, windows codepage) and mark all commit >> messages with it. > > I vote that this is an extraordinarily bad idea. > Fossil is a distributed SCM system. Potentially the distributed database in > question could be spread around the world. Do you really want the nightmare > (and impossibility!) of trying to keep track of which project is in which > encoding scheme on which machine? UTF-8 is a standard explicitly designed > to stop this kind of confusion. It's also been around since 1993, so your > development tools have had plenty of time to catch on and actually use it. > -- > "Perhaps people don't believe this, but throughout all of the discussions of > entering China our focus has really been what's best for the Chinese people. > It's not been about our revenue or profit or whatnot." > --Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra. > > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > > ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 18:09, Owen Shepherd wrote: > The trouble is that UTF-8 is a poor standard. It bloats many texts, is > quite expensive to parse, and has only one redeeming feature: It never > creates embedded nulls. I suppose that it shares its encoding with > ASCII is a feature too, but only a minor one. > > Personally, I think that most systems should adopt SCSU as their > storage encoding, but that's unlikely to happen until C strings and > MIME (two paragons of awfulness) die out. > > On 25 June 2010 16:00, Michael Richter wrote: >> On 25 June 2010 21:34, Michal Suchanek wrote: >>> >>> Perhaps fossil should have a "system encoding" which it would get from >>> the environment (locales, windows codepage) and mark all commit >>> messages with it. >> >> I vote that this is an extraordinarily bad idea. >> Fossil is a distributed SCM system. Potentially the distributed database in >> question could be spread around the world. Do you really want the nightmare >> (and impossibility!) of trying to keep track of which project is in which >> encoding scheme on which machine? UTF-8 is a standard explicitly designed >> to stop this kind of confusion. It's also been around since 1993, so your >> development tools have had plenty of time to catch on and actually use it. The fact is that Windows is a supported platform and on Windows common tools do not use UTF-8 for good or for bad. So there should at least be the code to identify the system encoding and convert it to the repo encoding. Also note that UTF-8 and Unicode in general is not the encoding of choice for CJK languages for various reasons. I guess it is acceptable to convert from the system ancoding to UTF-8 on a best-effort basis (which usually causes minimal loss of information if any) so that the repository commit messages and other texts shown on the web can be merged together without resorting to iframes or other similar atrocities. The tracked files themselves are, of course, free to be in any encoding. Still displaying files in arbitrary encoding on an UTF-8 web app is somewhat troublesome so it would be an advantage to have the possibilty to start a repo in different encoding or to switch the web encoding so that files in different encodings can be viewed easily. Tagging the files with an encoding when they are interpreted as text by fossil would be also useful. Thanks Michal ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
One of the reasons that I'm a fan of SCSU is that, with even a relatively simple encoder, it produces output which is comparable in efficiency to that of most legacy encodings. On 25 June 2010 18:53, Michal Suchanek wrote: > On 25 June 2010 18:09, Owen Shepherd wrote: >> The trouble is that UTF-8 is a poor standard. It bloats many texts, is >> quite expensive to parse, and has only one redeeming feature: It never >> creates embedded nulls. I suppose that it shares its encoding with >> ASCII is a feature too, but only a minor one. >> >> Personally, I think that most systems should adopt SCSU as their >> storage encoding, but that's unlikely to happen until C strings and >> MIME (two paragons of awfulness) die out. >> >> On 25 June 2010 16:00, Michael Richter wrote: >>> On 25 June 2010 21:34, Michal Suchanek wrote: Perhaps fossil should have a "system encoding" which it would get from the environment (locales, windows codepage) and mark all commit messages with it. >>> >>> I vote that this is an extraordinarily bad idea. >>> Fossil is a distributed SCM system. Potentially the distributed database in >>> question could be spread around the world. Do you really want the nightmare >>> (and impossibility!) of trying to keep track of which project is in which >>> encoding scheme on which machine? UTF-8 is a standard explicitly designed >>> to stop this kind of confusion. It's also been around since 1993, so your >>> development tools have had plenty of time to catch on and actually use it. > > The fact is that Windows is a supported platform and on Windows common > tools do not use UTF-8 for good or for bad. So there should at least > be the code to identify the system encoding and convert it to the repo > encoding. > > Also note that UTF-8 and Unicode in general is not the encoding of > choice for CJK languages for various reasons. I guess it is acceptable > to convert from the system ancoding to UTF-8 on a best-effort basis > (which usually causes minimal loss of information if any) so that the > repository commit messages and other texts shown on the web can be > merged together without resorting to iframes or other similar > atrocities. > > The tracked files themselves are, of course, free to be in any > encoding. Still displaying files in arbitrary encoding on an UTF-8 web > app is somewhat troublesome so it would be an advantage to have the > possibilty to start a repo in different encoding or to switch the web > encoding so that files in different encodings can be viewed easily. > Tagging the files with an encoding when they are interpreted as text > by fossil would be also useful. > > Thanks > > Michal > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
As an FYI I googled SCSU http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode Owen Shepherd wrote: > One of the reasons that I'm a fan of SCSU is that, with even a > relatively simple encoder, it produces output which is comparable in > efficiency to that of most legacy encodings. > > On 25 June 2010 18:53, Michal Suchanek wrote: >> On 25 June 2010 18:09, Owen Shepherd wrote: >>> The trouble is that UTF-8 is a poor standard. It bloats many texts, is >>> quite expensive to parse, and has only one redeeming feature: It never >>> creates embedded nulls. I suppose that it shares its encoding with >>> ASCII is a feature too, but only a minor one. >>> >>> Personally, I think that most systems should adopt SCSU as their >>> storage encoding, but that's unlikely to happen until C strings and >>> MIME (two paragons of awfulness) die out. >>> >>> On 25 June 2010 16:00, Michael Richter wrote: On 25 June 2010 21:34, Michal Suchanek wrote: > Perhaps fossil should have a "system encoding" which it would get from > the environment (locales, windows codepage) and mark all commit > messages with it. I vote that this is an extraordinarily bad idea. Fossil is a distributed SCM system. Potentially the distributed database in question could be spread around the world. Do you really want the nightmare (and impossibility!) of trying to keep track of which project is in which encoding scheme on which machine? UTF-8 is a standard explicitly designed to stop this kind of confusion. It's also been around since 1993, so your development tools have had plenty of time to catch on and actually use it. >> The fact is that Windows is a supported platform and on Windows common >> tools do not use UTF-8 for good or for bad. So there should at least >> be the code to identify the system encoding and convert it to the repo >> encoding. -- Andreas Kupries Senior Tcl Developer ActiveState, The Dynamic Language Experts P: 778.786.1122 F: 778.786.1133 andre...@activestate.com http://www.activestate.com Get insights on Open Source and Dynamic Languages at www.activestate.com/blog ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 20:18, Owen Shepherd wrote: > One of the reasons that I'm a fan of SCSU is that, with even a > relatively simple encoder, it produces output which is comparable in > efficiency to that of most legacy encodings. SCSU is a horrendous encoding because it uses shifts. When the shift is lost the text has completely different meaning. In UTF-8 if you remove part of the text only that part is affected (if you cut mid-character you create a bad character at worst but it can be clearly detected). Thanks Michal ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] May fossil push via ssh?
On 25 June 2010 12:16, Ruslan Popov wrote: > Hi, > > May fossil push via ssh? > Not the way you are used to I guess. You can start a fossil server and create a ssh tunnel to it and then use the HTTP push. Or you can use sshfs. HTH Michal ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) inproject
UTF-8 in the database, any other encoding at the developer's console. SVN and GIT has this feature (variable encodings for commits and for console output) - this is per developer variables, not related to p2p sync encoding. - Original Message - From: Michael Richter To: fossil-users@lists.fossil-scm.org Sent: Friday, June 25, 2010 6:00 PM Fossil is a distributed SCM system. Potentially the distributed database in question could be spread around the world. Do you really want the nightmare (and impossibility!) of trying to keep track of which project is in which encoding scheme on which machine?___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project
On 25 June 2010 19:36, Michal Suchanek wrote: > On 25 June 2010 20:18, Owen Shepherd wrote: >> One of the reasons that I'm a fan of SCSU is that, with even a >> relatively simple encoder, it produces output which is comparable in >> efficiency to that of most legacy encodings. > > SCSU is a horrendous encoding because it uses shifts. When the shift > is lost the text has completely different meaning. In UTF-8 if you > remove part of the text only that part is affected (if you cut > mid-character you create a bad character at worst but it can be > clearly detected). And how often do you lose a couple of bytes in the middle of a file? More precisely, how often do you lose them and not have a checksum fail (or some other error) notifying you of this? It's a particularly egregious complaint in the context of Fossil - where all records are hashed anyway! Additionally, if the same kind of error were to occur to the SQLite file that the repository is contained within, it would probably be trashed irretrievably. Years of experience with binary and other modal file formats (XML and HTML to name two very common) show that this is a complete non-issue. SCSU is of course a poor choice for an in-memory format (Use UTF-16) or interacting with the console (For backwards compatibility you're probably going to have to use UTF-8). But for a storage format, particularly one embedded within a database? It's pretty much perfect. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] May fossil push via ssh?
Thank you. On Fri, Jun 25, 2010 at 10:40 PM, Michal Suchanek wrote: > On 25 June 2010 12:16, Ruslan Popov wrote: > > Hi, > > > > May fossil push via ssh? > > > > Not the way you are used to I guess. You can start a fossil server and > create a ssh tunnel to it and then use the HTTP push. Or you can use > sshfs. > > HTH > > Michal > ___ > fossil-users mailing list > fossil-users@lists.fossil-scm.org > http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users > -- Ruslan Popov phone: +7 916 926 1205 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users