[fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Ruslan Popov
Hi all,

I've tried to use Fossil on russian version of Windows 7. I made commit with
russian text in comment, when I run the UI and look at timeline, I saw that
russian text looks like squares. Is any possibility to setup the codeset of
commits? It would be nice if I can autotranslate them into UTF-8 while
comminting. If there is no such ability, can I change the text of commits
(into english, for example).

-- 
Ruslan Popov
phone: +7 916 926 1205
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Sergey Sfeli
Ruslan Popov wrote:

> I've tried to use Fossil on russian version of Windows 7. I made commit with
> russian text in comment, when I run the UI and look at timeline, I saw that
> russian text looks like squares.

Why don't just use text editor that supports UTF-8 and write your
comments in UTF-8 instead of cp1251? You can set/change default text
editor with "fossil set editor anything-else-than-notepad" (I am using
Notepad2, for example).


Question to Richard Hipp: can you please add any UTF-8 character(s) to the
following text to help text editors to auto-detect the right encoding?

# Enter comments on this check-in.  Lines beginning with # are ignored.
# The check-in comment follows wiki formatting rules.

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] May fossil push via ssh?

2010-06-25 Thread Ruslan Popov
Hi,

May fossil push via ssh?

-- 
Ruslan Popov
phone: +7 916 926 1205
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Ruslan Popov
Sergey, now I use emacs and its mule-utf-8-unix encoding for commit buffer.

On Fri, Jun 25, 2010 at 2:15 PM, Sergey Sfeli wrote:

> Ruslan Popov wrote:
>
> > I've tried to use Fossil on russian version of Windows 7. I made commit
> with
> > russian text in comment, when I run the UI and look at timeline, I saw
> that
> > russian text looks like squares.
>
> Why don't just use text editor that supports UTF-8 and write your
> comments in UTF-8 instead of cp1251? You can set/change default text
> editor with "fossil set editor anything-else-than-notepad" (I am using
> Notepad2, for example).
>
>
> Question to Richard Hipp: can you please add any UTF-8 character(s) to the
> following text to help text editors to auto-detect the right encoding?
>
> # Enter comments on this check-in.  Lines beginning with # are ignored.
> # The check-in comment follows wiki formatting rules.
>
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>



-- 
Ruslan Popov
phone: +7 916 926 1205
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Michal Suchanek
On 25 June 2010 12:15, Sergey Sfeli  wrote:
> Ruslan Popov wrote:
>
>> I've tried to use Fossil on russian version of Windows 7. I made commit with
>> russian text in comment, when I run the UI and look at timeline, I saw that
>> russian text looks like squares.
>
> Why don't just use text editor that supports UTF-8 and write your
> comments in UTF-8 instead of cp1251? You can set/change default text
> editor with "fossil set editor anything-else-than-notepad" (I am using
> Notepad2, for example).
>
>
> Question to Richard Hipp: can you please add any UTF-8 character(s) to the
> following text to help text editors to auto-detect the right encoding?
>
> # Enter comments on this check-in.  Lines beginning with # are ignored.
> # The check-in comment follows wiki formatting rules.
>

Perhaps fossil should have a "system encoding" which it would get from
the environment (locales, windows codepage) and mark all commit
messages with it.

This should mark the commits with the correct encoding on most
unix-like systems. On windows there is a "DOS codepage" and a "Windows
codepage" so there is no completely reliable way of determining the
encoding used on the system although the "Windows codepage" would be
what most windowed programs use. Still there should be a possibility
to set the encoding explicitly.

It is somewhat open question what to do when displaying timeline of a
repository with commits in multiple encodings, though.

Thanks

Michal
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Michael Richter
On 25 June 2010 21:34, Michal Suchanek  wrote:

> Perhaps fossil should have a "system encoding" which it would get from
> the environment (locales, windows codepage) and mark all commit
> messages with it.
>

I vote that this is an extraordinarily bad idea.

Fossil is a *distributed* SCM system.  Potentially the distributed database
in question could be spread around the world.  Do you really want the
nightmare (and impossibility!) of trying to keep track of which project is
in which encoding scheme on which machine?  UTF-8 is a standard
*explicitly*designed to
*stop* this kind of confusion.  It's also been around since 1993, so your
development tools have had plenty of time to catch on and actually use it.

-- 
"Perhaps people don't believe this, but throughout all of the discussions of
entering China our focus has really been what's best for the Chinese people.
It's not been about our revenue or profit or whatnot."
--Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Owen Shepherd
On 25 June 2010 11:15, Sergey Sfeli  wrote:
>
> Ruslan Popov wrote:
>
> > I've tried to use Fossil on russian version of Windows 7. I made commit with
> > russian text in comment, when I run the UI and look at timeline, I saw that
> > russian text looks like squares.
>
> Why don't just use text editor that supports UTF-8 and write your
> comments in UTF-8 instead of cp1251? You can set/change default text
> editor with "fossil set editor anything-else-than-notepad" (I am using
> Notepad2, for example).
>
>
> Question to Richard Hipp: can you please add any UTF-8 character(s) to the
> following text to help text editors to auto-detect the right encoding?
>
> # Enter comments on this check-in.  Lines beginning with # are ignored.
> # The check-in comment follows wiki formatting rules.
>
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

The correct character to add would be U+FEFF (UTF-8 "\xEF\xBB\xBF")
"ZERO-WIDTH NON-BREAKING SPACE", commonly called the byte order mark,
and which may be familiar for showing up as "" in badly encoded
documents. For sensible programs, this should be a dead give away to
switch into UTF-8 mode; chief among these programs Notepad. For
programs which assume UTF-8 but don't detect based upon BOMs it should
be an invisible character.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Owen Shepherd
The trouble is that UTF-8 is a poor standard. It bloats many texts, is
quite expensive to parse, and has only one redeeming feature: It never
creates embedded nulls. I suppose that it shares its encoding with
ASCII is a feature too, but only a minor one.

Personally, I think that most systems should adopt SCSU as their
storage encoding, but that's unlikely to happen until C strings and
MIME (two paragons of awfulness) die out.

On 25 June 2010 16:00, Michael Richter  wrote:
> On 25 June 2010 21:34, Michal Suchanek  wrote:
>>
>> Perhaps fossil should have a "system encoding" which it would get from
>> the environment (locales, windows codepage) and mark all commit
>> messages with it.
>
> I vote that this is an extraordinarily bad idea.
> Fossil is a distributed SCM system.  Potentially the distributed database in
> question could be spread around the world.  Do you really want the nightmare
> (and impossibility!) of trying to keep track of which project is in which
> encoding scheme on which machine?  UTF-8 is a standard explicitly designed
> to stop this kind of confusion.  It's also been around since 1993, so your
> development tools have had plenty of time to catch on and actually use it.
> --
> "Perhaps people don't believe this, but throughout all of the discussions of
> entering China our focus has really been what's best for the Chinese people.
> It's not been about our revenue or profit or whatnot."
> --Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.
>
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
>
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Michal Suchanek
On 25 June 2010 18:09, Owen Shepherd  wrote:
> The trouble is that UTF-8 is a poor standard. It bloats many texts, is
> quite expensive to parse, and has only one redeeming feature: It never
> creates embedded nulls. I suppose that it shares its encoding with
> ASCII is a feature too, but only a minor one.
>
> Personally, I think that most systems should adopt SCSU as their
> storage encoding, but that's unlikely to happen until C strings and
> MIME (two paragons of awfulness) die out.
>
> On 25 June 2010 16:00, Michael Richter  wrote:
>> On 25 June 2010 21:34, Michal Suchanek  wrote:
>>>
>>> Perhaps fossil should have a "system encoding" which it would get from
>>> the environment (locales, windows codepage) and mark all commit
>>> messages with it.
>>
>> I vote that this is an extraordinarily bad idea.
>> Fossil is a distributed SCM system.  Potentially the distributed database in
>> question could be spread around the world.  Do you really want the nightmare
>> (and impossibility!) of trying to keep track of which project is in which
>> encoding scheme on which machine?  UTF-8 is a standard explicitly designed
>> to stop this kind of confusion.  It's also been around since 1993, so your
>> development tools have had plenty of time to catch on and actually use it.

The fact is that Windows is a supported platform and on Windows common
tools do not use UTF-8 for good or for bad. So there should at least
be the code to identify the system encoding and convert it to the repo
encoding.

Also note that UTF-8 and Unicode in general is not the encoding of
choice for CJK languages for various reasons. I guess it is acceptable
to convert from the system ancoding to UTF-8 on a best-effort basis
(which usually causes minimal loss of information if any) so that the
repository commit messages and other texts shown on the web can be
merged together without resorting to iframes or other similar
atrocities.

The tracked files themselves are, of course, free to be in any
encoding. Still displaying files in arbitrary encoding on an UTF-8 web
app is somewhat troublesome so it would be an advantage to have the
possibilty to start a repo in different encoding or to switch the web
encoding so that files in different encodings can be viewed easily.
Tagging the files with an encoding when they are interpreted as text
by fossil would be also useful.

Thanks

Michal
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Owen Shepherd
One of the reasons that I'm a fan of SCSU is that, with even a
relatively simple encoder, it produces output which is comparable in
efficiency to that of most legacy encodings.

On 25 June 2010 18:53, Michal Suchanek  wrote:
> On 25 June 2010 18:09, Owen Shepherd  wrote:
>> The trouble is that UTF-8 is a poor standard. It bloats many texts, is
>> quite expensive to parse, and has only one redeeming feature: It never
>> creates embedded nulls. I suppose that it shares its encoding with
>> ASCII is a feature too, but only a minor one.
>>
>> Personally, I think that most systems should adopt SCSU as their
>> storage encoding, but that's unlikely to happen until C strings and
>> MIME (two paragons of awfulness) die out.
>>
>> On 25 June 2010 16:00, Michael Richter  wrote:
>>> On 25 June 2010 21:34, Michal Suchanek  wrote:

 Perhaps fossil should have a "system encoding" which it would get from
 the environment (locales, windows codepage) and mark all commit
 messages with it.
>>>
>>> I vote that this is an extraordinarily bad idea.
>>> Fossil is a distributed SCM system.  Potentially the distributed database in
>>> question could be spread around the world.  Do you really want the nightmare
>>> (and impossibility!) of trying to keep track of which project is in which
>>> encoding scheme on which machine?  UTF-8 is a standard explicitly designed
>>> to stop this kind of confusion.  It's also been around since 1993, so your
>>> development tools have had plenty of time to catch on and actually use it.
>
> The fact is that Windows is a supported platform and on Windows common
> tools do not use UTF-8 for good or for bad. So there should at least
> be the code to identify the system encoding and convert it to the repo
> encoding.
>
> Also note that UTF-8 and Unicode in general is not the encoding of
> choice for CJK languages for various reasons. I guess it is acceptable
> to convert from the system ancoding to UTF-8 on a best-effort basis
> (which usually causes minimal loss of information if any) so that the
> repository commit messages and other texts shown on the web can be
> merged together without resorting to iframes or other similar
> atrocities.
>
> The tracked files themselves are, of course, free to be in any
> encoding. Still displaying files in arbitrary encoding on an UTF-8 web
> app is somewhat troublesome so it would be an advantage to have the
> possibilty to start a repo in different encoding or to switch the web
> encoding so that files in different encodings can be viewed easily.
> Tagging the files with an encoding when they are interpreted as text
> by fossil would be also useful.
>
> Thanks
>
> Michal
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Andreas Kupries

As an FYI I googled SCSU

http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode

Owen Shepherd wrote:
> One of the reasons that I'm a fan of SCSU is that, with even a
> relatively simple encoder, it produces output which is comparable in
> efficiency to that of most legacy encodings.
> 
> On 25 June 2010 18:53, Michal Suchanek  wrote:
>> On 25 June 2010 18:09, Owen Shepherd  wrote:
>>> The trouble is that UTF-8 is a poor standard. It bloats many texts, is
>>> quite expensive to parse, and has only one redeeming feature: It never
>>> creates embedded nulls. I suppose that it shares its encoding with
>>> ASCII is a feature too, but only a minor one.
>>>
>>> Personally, I think that most systems should adopt SCSU as their
>>> storage encoding, but that's unlikely to happen until C strings and
>>> MIME (two paragons of awfulness) die out.
>>>
>>> On 25 June 2010 16:00, Michael Richter  wrote:
 On 25 June 2010 21:34, Michal Suchanek  wrote:
> Perhaps fossil should have a "system encoding" which it would get from
> the environment (locales, windows codepage) and mark all commit
> messages with it.
 I vote that this is an extraordinarily bad idea.
 Fossil is a distributed SCM system.  Potentially the distributed database 
 in
 question could be spread around the world.  Do you really want the 
 nightmare
 (and impossibility!) of trying to keep track of which project is in which
 encoding scheme on which machine?  UTF-8 is a standard explicitly designed
 to stop this kind of confusion.  It's also been around since 1993, so your
 development tools have had plenty of time to catch on and actually use it.
>> The fact is that Windows is a supported platform and on Windows common
>> tools do not use UTF-8 for good or for bad. So there should at least
>> be the code to identify the system encoding and convert it to the repo
>> encoding.


-- 
Andreas Kupries
Senior Tcl Developer
ActiveState, The Dynamic Language Experts

P: 778.786.1122
F: 778.786.1133
andre...@activestate.com
http://www.activestate.com
Get insights on Open Source and Dynamic Languages at www.activestate.com/blog
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Michal Suchanek
On 25 June 2010 20:18, Owen Shepherd  wrote:
> One of the reasons that I'm a fan of SCSU is that, with even a
> relatively simple encoder, it produces output which is comparable in
> efficiency to that of most legacy encodings.

SCSU is a horrendous encoding because it uses shifts. When the shift
is lost the text has completely different meaning. In UTF-8 if you
remove part of the text only that part is affected (if you cut
mid-character you create a bad character at worst but it can be
clearly detected).

Thanks

Michal
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] May fossil push via ssh?

2010-06-25 Thread Michal Suchanek
On 25 June 2010 12:16, Ruslan Popov  wrote:
> Hi,
>
> May fossil push via ssh?
>

Not the way you are used to I guess. You can start a fossil server and
create a ssh tunnel to it and then use the HTTP push. Or you can use
sshfs.

HTH

Michal
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) inproject

2010-06-25 Thread Andrey Cherezov
UTF-8 in the database, any other encoding at the developer's console. SVN and 
GIT has this feature (variable encodings for commits and for console output) - 
this is per developer variables, not related to p2p sync encoding.
  - Original Message - 
  From: Michael Richter 
  To: fossil-users@lists.fossil-scm.org 
  Sent: Friday, June 25, 2010 6:00 PM

  Fossil is a distributed SCM system.  Potentially the distributed database in 
question could be spread around the world.  Do you really want the nightmare 
(and impossibility!) of trying to keep track of which project is in which 
encoding scheme on which machine?___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Mix of UTF-8 and CP1251 (Russian cyrillic) in project

2010-06-25 Thread Owen Shepherd
On 25 June 2010 19:36, Michal Suchanek  wrote:
> On 25 June 2010 20:18, Owen Shepherd  wrote:
>> One of the reasons that I'm a fan of SCSU is that, with even a
>> relatively simple encoder, it produces output which is comparable in
>> efficiency to that of most legacy encodings.
>
> SCSU is a horrendous encoding because it uses shifts. When the shift
> is lost the text has completely different meaning. In UTF-8 if you
> remove part of the text only that part is affected (if you cut
> mid-character you create a bad character at worst but it can be
> clearly detected).

And how often do you lose a couple of bytes in the middle of a file?
More precisely, how often do you lose them and not have a checksum
fail (or some other error) notifying you of this?

It's a particularly egregious complaint in the context of Fossil -
where all records are hashed anyway! Additionally, if the same kind of
error were to occur to the SQLite file that the repository is
contained within, it would probably be trashed irretrievably.

Years of experience with binary and other modal file formats (XML and
HTML to name two very common) show that this is a complete non-issue.

SCSU is of course a poor choice for an in-memory format (Use UTF-16)
or interacting with the console (For backwards compatibility you're
probably going to have to use UTF-8). But for a storage format,
particularly one embedded within a database? It's pretty much perfect.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] May fossil push via ssh?

2010-06-25 Thread Ruslan Popov
Thank you.

On Fri, Jun 25, 2010 at 10:40 PM, Michal Suchanek wrote:

> On 25 June 2010 12:16, Ruslan Popov  wrote:
> > Hi,
> >
> > May fossil push via ssh?
> >
>
> Not the way you are used to I guess. You can start a fossil server and
> create a ssh tunnel to it and then use the HTTP push. Or you can use
> sshfs.
>
> HTH
>
> Michal
> ___
> fossil-users mailing list
> fossil-users@lists.fossil-scm.org
> http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
>



-- 
Ruslan Popov
phone: +7 916 926 1205
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users