Re: httpd and locales

2006-01-19 Thread André Malo
* Garrett Rooney [EMAIL PROTECTED] wrote:

  It doesn't belong here, but... I'm wondering why the path isn't passed as
  UTF-8. Why is it translated to the locale at all? It's all happening within
  the svn file system, so I'd really expect to get utf-8 and would consider
  locale translation as a bug.
 
 Well, I imagine that the assumption is that any hook script is going
 to be using the actual locale specified in LANG/LC_ALL/etc env
 variables, so if we don't translate to that locale it'll get rather
 confused by utf8 data in its command line.  As a general rule svn
 translates from native - utf8 on input and from utf8 - native for
 output.  Ironically, if the LANG/LC_ALL/etc env vars were being
 followed by httpd this translation would be a noop, since the system
 uses a utf8 locale...

So whether the users of a repository (httpd or svnserve) may use the full
unicode range for their files depends on the locale of the server? That feels
just wrong ;-) I don't see how there are command line confusings...

As long as one references files enclosed in the filesystem no translation
should occur at all. It's just unicode (in utf-8 format). The only part of
the subversion system which should deal with filename recodings of reposiory 
stored path should be a client.

But as said, this doesn't belong here.

nd


Re: httpd and locales

2006-01-19 Thread Branko Čibej

André Malo wrote:

* Garrett Rooney [EMAIL PROTECTED] wrote:

  

It doesn't belong here, but... I'm wondering why the path isn't passed as
UTF-8. Why is it translated to the locale at all? It's all happening within
the svn file system, so I'd really expect to get utf-8 and would consider
locale translation as a bug.
  

Well, I imagine that the assumption is that any hook script is going
to be using the actual locale specified in LANG/LC_ALL/etc env
variables, so if we don't translate to that locale it'll get rather
confused by utf8 data in its command line.  As a general rule svn
translates from native - utf8 on input and from utf8 - native for
output.  Ironically, if the LANG/LC_ALL/etc env vars were being
followed by httpd this translation would be a noop, since the system
uses a utf8 locale...



So whether the users of a repository (httpd or svnserve) may use the full
unicode range for their files depends on the locale of the server? That feels
just wrong ;-) I don't see how there are command line confusings...
  
You're confusing the content of the SVN repository and hook scripts 
stored on the local filesystem. Paths in the first are always encoded in 
UTF-8. The latter naturally have to obey the server's locale.


-- Brane



Re: httpd and locales

2006-01-19 Thread André Malo
* Branko Čibej wrote:

 You're confusing the content of the SVN repository and hook scripts
 stored on the local filesystem. Paths in the first are always encoded in
 UTF-8. The latter naturally have to obey the server's locale.

I don't think so. The task was to pass the name of a file stored in the 
repository to a hook script via the command line. Otherwise I must have 
misunderstood something quite heavily.

nd
-- 
Das einzige, das einen Gebäudekollaps (oder auch einen
thermonuklearen Krieg) unbeschadet übersteht, sind Kakerlaken
und AOL-CDs.
  -- Bastian Lipp in dcsm


Re: httpd and locales

2006-01-19 Thread Garrett Rooney
On 1/19/06, André Malo [EMAIL PROTECTED] wrote:
 * Branko Čibej wrote:

  You're confusing the content of the SVN repository and hook scripts
  stored on the local filesystem. Paths in the first are always encoded in
  UTF-8. The latter naturally have to obey the server's locale.

 I don't think so. The task was to pass the name of a file stored in the
 repository to a hook script via the command line. Otherwise I must have
 misunderstood something quite heavily.

That is correct, it's an argument to the hook script that happens to
contain the path of a file in the repository.  Currently all arguments
are transcoded from utf8 to native before we execute the hook script.

-garrett


Re: httpd and locales

2006-01-19 Thread Joe Orton
On Thu, Jan 19, 2006 at 11:09:13AM -0800, Garrett Rooney wrote:
 On 1/19/06, André Malo [EMAIL PROTECTED] wrote:
  * Branko Čibej wrote:
 
   You're confusing the content of the SVN repository and hook scripts
   stored on the local filesystem. Paths in the first are always encoded in
   UTF-8. The latter naturally have to obey the server's locale.
 
  I don't think so. The task was to pass the name of a file stored in the
  repository to a hook script via the command line. Otherwise I must have
  misunderstood something quite heavily.
 
 That is correct, it's an argument to the hook script that happens to
 contain the path of a file in the repository.  Currently all arguments
 are transcoded from utf8 to native before we execute the hook script.

I really don't think that relying on that working properly is a good 
idea.  All it takes is for one rogue PHP script to set the locale to 
some odd locale to be able to print currency symbols properly or 
whatever, and the hook scripts would start behaving really strangely.

As a module author, presuming the locale is undefined is the safest bet, 
and as an adminstrator, starting the server in the C locale is the 
safest bet.

joe


Re: httpd and locales

2006-01-18 Thread Joe Orton
On Wed, Jan 18, 2006 at 11:17:30AM -0800, Garrett Rooney wrote:
 Is there any particular reason that httpd never does the
 'setlocale(LC_ALL, );' magic necessary to get libc to respect the
 various locale related environment variables?  As far as I can tell,
 despite system settings for locale (i.e. /etc/sysconfig/i18n on RHEL)
 httpd always runs with a locale of C, which is fine for most things,
 but pretty irritating if you have a need to do stuff with multibyte
 strings in a module.
 
 Just adding a call to setlocale with a  locale in httpd's main makes
 my particular problem go away, but I'm kind of hesitant to propose
 actually doing so since I don't know what kind of fallout there would
 be from having httpd all of a sudden start respecting the environment
 variables...

Ideally the locale shouldn't matter, but in practice it does: notably 
strcasecmp() and the is* functions behave differently.  This can cause 
things to fail in surprising ways, so it's generally to be avoided.

Various modules will do it at startup anyway, so it's hard to avoid 
completely, but it's not something that I'd really advise propagating.

joe




Re: httpd and locales

2006-01-18 Thread Garrett Rooney
On 1/18/06, Joe Orton [EMAIL PROTECTED] wrote:
 On Wed, Jan 18, 2006 at 11:17:30AM -0800, Garrett Rooney wrote:
  Is there any particular reason that httpd never does the
  'setlocale(LC_ALL, );' magic necessary to get libc to respect the
  various locale related environment variables?  As far as I can tell,
  despite system settings for locale (i.e. /etc/sysconfig/i18n on RHEL)
  httpd always runs with a locale of C, which is fine for most things,
  but pretty irritating if you have a need to do stuff with multibyte
  strings in a module.
 
  Just adding a call to setlocale with a  locale in httpd's main makes
  my particular problem go away, but I'm kind of hesitant to propose
  actually doing so since I don't know what kind of fallout there would
  be from having httpd all of a sudden start respecting the environment
  variables...

 Ideally the locale shouldn't matter, but in practice it does: notably
 strcasecmp() and the is* functions behave differently.  This can cause
 things to fail in surprising ways, so it's generally to be avoided.

 Various modules will do it at startup anyway, so it's hard to avoid
 completely, but it's not something that I'd really advise propagating.

The specific problem I'm trying to fix is that mod_dav_svn fails to
run a pre-lock hook script when you try to lock a filename with double
byte characters.  It never even gets to the point of trying to run the
script, it fails trying to build the command line because it can't
convert the filename from utf8 to the native encoding because the
locale is C and thus the native encoding is 7 bit ascii.  I'm having
trouble finding a work around for this that doesn't involve setting
the locale, although if there's anything obvious I'm missing I'd love
to hear it.

-garrett


Re: httpd and locales

2006-01-18 Thread André Malo
* Garrett Rooney wrote:

 The specific problem I'm trying to fix is that mod_dav_svn fails to
 run a pre-lock hook script when you try to lock a filename with double
 byte characters.  It never even gets to the point of trying to run the
 script, it fails trying to build the command line because it can't
 convert the filename from utf8 to the native encoding because the
 locale is C and thus the native encoding is 7 bit ascii.  I'm having
 trouble finding a work around for this that doesn't involve setting
 the locale, although if there's anything obvious I'm missing I'd love
 to hear it.

It doesn't belong here, but... I'm wondering why the path isn't passed as 
UTF-8. Why is it translated to the locale at all? It's all happening within 
the svn file system, so I'd really expect to get utf-8 and would consider 
locale translation as a bug.

nd
-- 
Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte -- Karl May, Winnetou III

Im Westen was neues: http://pub.perlig.de/books.html#apache2


Re: httpd and locales

2006-01-18 Thread Garrett Rooney
On 1/18/06, André Malo [EMAIL PROTECTED] wrote:
 * Garrett Rooney wrote:

  The specific problem I'm trying to fix is that mod_dav_svn fails to
  run a pre-lock hook script when you try to lock a filename with double
  byte characters.  It never even gets to the point of trying to run the
  script, it fails trying to build the command line because it can't
  convert the filename from utf8 to the native encoding because the
  locale is C and thus the native encoding is 7 bit ascii.  I'm having
  trouble finding a work around for this that doesn't involve setting
  the locale, although if there's anything obvious I'm missing I'd love
  to hear it.

 It doesn't belong here, but... I'm wondering why the path isn't passed as
 UTF-8. Why is it translated to the locale at all? It's all happening within
 the svn file system, so I'd really expect to get utf-8 and would consider
 locale translation as a bug.

Well, I imagine that the assumption is that any hook script is going
to be using the actual locale specified in LANG/LC_ALL/etc env
variables, so if we don't translate to that locale it'll get rather
confused by utf8 data in its command line.  As a general rule svn
translates from native - utf8 on input and from utf8 - native for
output.  Ironically, if the LANG/LC_ALL/etc env vars were being
followed by httpd this translation would be a noop, since the system
uses a utf8 locale...

-garrett