Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Arfrever Frehtes Taifersar Arahesis
2010-07-31 22:25:26 Petteri Räty napisał(a):
 On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote:
 
  If the variable is set but not exported then it is local to the shell
  env.  When bash goes to exec() python the local shell variables are
  not in the env; so os.environ() will not contain them.
 
  anta...@kyoto ~ $ foo=BAR
  anta...@kyoto ~ $ echo $foo
  BAR
  anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
  None
  anta...@kyoto ~ $ export foo
  anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
  BAR
  
  I want only variables exported to Python processes.
  
 
 export -p

It would have to be parsed using e.g. grep and sed. It's easier to call Python 
in this case.
The call to Python is sufficiently fast:

$ time python -c 'import os; print(os.environ.get(LC_ALL, 
os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX'  /dev/null

real0m0.062s
user0m0.051s
sys 0m0.011s

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Arfrever Frehtes Taifersar Arahesis
A milder warning will be printed.

-- 
Arfrever Frehtes Taifersar Arahesis
--- python.eclass
+++ python.eclass
@@ -355,6 +355,8 @@
 	# Check if phase is pkg_setup().
 	[[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used only in pkg_setup() phase
 
+	local locale
+
 	if [[ $# -ne 0 ]]; then
 		die ${FUNCNAME}() does not accept arguments
 	fi
@@ -407,6 +409,15 @@
 		unset -f python_pkg_setup_check_USE_flags
 	fi
 
+	if [[ $(locale charmap) != UTF-8 ]]; then
+		locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')
+		ewarn
+		ewarn Currently used locale '${locale}' can cause UnicodeDecodeError or UnicodeEncodeError
+		ewarn exceptions. It is recommended to use a UTF-8 locale to avoid problems.
+		ewarn See http://www.gentoo.org/doc/en/utf-8.xml for information on how to change locale.
+		ewarn
+	fi
+
 	PYTHON_PKG_SETUP_EXECUTED=1
 }
 


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Harald van Dijk
On Mon, Aug 02, 2010 at 11:02:20PM +0200, Arfrever Frehtes Taifersar Arahesis 
wrote:
 It would have to be parsed using e.g. grep and sed. It's easier to call 
 Python in this case.

It's even easier not to.

 The call to Python is sufficiently fast:
 
 $ time python -c 'import os; print(os.environ.get(LC_ALL, 
 os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX'  /dev/null
 
 real0m0.062s
 user0m0.051s
 sys 0m0.011s

Let's compare. On my system:

time python -c 'import os; print(os.environ.get(LC_ALL, 
os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX'
en_GB.UTF-8

real0m0.020s
user0m0.016s
sys 0m0.004s

time sh -c 'echo ${LC_ALL:-${LC_CTYPE:-${LANG:-POSIX}}}'
en_GB.UTF-8

real0m0.001s
user0m0.000s
sys 0m0.000s

And that's after several runs for both, so it's not caused by the
initial load of python, which wasn't in memory yet.

Yes, 0.019s is very little, but in this case I see absolutely no benefit
whatsoever in calling python. Plus sh has the advantage of actually
working when LC_ALL is exported as  (which in LC_* means the same as
having it unset)...

But why exactly are you concerned about LC_* being defined but not
exported anyway? You're checking from an ebuild; locales are going to
get inherited from portage or profile.env anyway, so you can just
assume that if they _are_ set, they're exported. The only way they might
not be is if the user is messing with the locale from the bashrc, and if
the user's doing that, the user really needs to fix the bashrc and
export the vars anyway.

None of this changes the fact that locale checks warns about bugs in packages,
not bugs in the user's configuration.



Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Jeroen Roovers
On Mon, 2 Aug 2010 23:18:59 +0200
Arfrever Frehtes Taifersar Arahesis arfre...@gentoo.org wrote:

 A milder warning will be printed.

I distinctly remember several voices being raised in this thread very
recently, suggesting if not demanding that you should not convey a
message like that at all, but fix the affected packages instead.


 jer



Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Jeroen Roovers
On Mon, 2 Aug 2010 23:18:59 +0200
Arfrever Frehtes Taifersar Arahesis arfre...@gentoo.org wrote:

 + ewarn exceptions. It is recommended to use a UTF-8
 locale to avoid problems.
 + ewarn See http://www.gentoo.org/doc/en/utf-8.xml
 for information on how to change locale.

In fact the documentation you point to positively encourages
users/admins to set up locales and explains how to do it system-wide,
and in no place does it warn against any adverse effects of doing so.
So you can't even point to that documentation in defence of this milder
warning.



 jer



Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-08-02 Thread Brian Harring
On Mon, Aug 02, 2010 at 11:18:59PM +0200, Arfrever Frehtes Taifersar Arahesis 
wrote:
 A milder warning will be printed.

Guessing you didn't get the part about no warning should be put in 
that everyone stated?  You're ignoring that this message also will 
make users think that switching their locale will magically fix 
programs that chuck encoding errors (validly so, if not particularly 
user friendly) when running into improperly encoded files (regardless 
of locale).

This locale crap doesn't belong in the tree, mild warning or not- do 
not add it.  Take it up to the council if you really think everyone 
else is wrong and still want it.

~harring


pgplJh89Kgb3h.pgp
Description: PGP signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread Arfrever Frehtes Taifersar Arahesis
2010-07-30 04:36:22 Brian Harring napisał(a):
 On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis 
 wrote:
  --- python.eclass
  +++ python.eclass
  @@ -355,6 +355,8 @@
  # Check if phase is pkg_setup().
  [[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used 
  only in pkg_setup() phase
   
  +   local locale
  +
  if [[ $# -ne 0 ]]; then
  die ${FUNCNAME}() does not accept arguments
  fi
  @@ -407,6 +409,16 @@
  unset -f python_pkg_setup_check_USE_flags
  fi
   
  +   locale=$(python -c 'import os; print(os.environ.get(LC_ALL, 
  os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')
 
 You're using python to get the exported env.  Don't.  Use bash (you're 
 invoking python from freaking bash after all)...

Given variable can be set, but not exported.

 bug 328047 is induced by a patch we add (it's not in upstream python).  

This patch comes from upstream.

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread Alec Warner
On Sat, Jul 31, 2010 at 7:44 AM, Arfrever Frehtes Taifersar Arahesis
arfre...@gentoo.org wrote:
 2010-07-30 04:36:22 Brian Harring napisał(a):
 On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar 
 Arahesis wrote:
  --- python.eclass
  +++ python.eclass
  @@ -355,6 +355,8 @@
      # Check if phase is pkg_setup().
      [[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used 
  only in pkg_setup() phase
 
  +   local locale
  +
      if [[ $# -ne 0 ]]; then
              die ${FUNCNAME}() does not accept arguments
      fi
  @@ -407,6 +409,16 @@
              unset -f python_pkg_setup_check_USE_flags
      fi
 
  +   locale=$(python -c 'import os; print(os.environ.get(LC_ALL, 
  os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')

 You're using python to get the exported env.  Don't.  Use bash (you're
 invoking python from freaking bash after all)...

 Given variable can be set, but not exported.

If the variable is set but not exported then it is local to the shell
env.  When bash goes to exec() python the local shell variables are
not in the env; so os.environ() will not contain them.

anta...@kyoto ~ $ foo=BAR
anta...@kyoto ~ $ echo $foo
BAR
anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
None
anta...@kyoto ~ $ export foo
anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
BAR

so how is this any different than:

[[ -n $LC_TYPE ]]  locale=$LC_TYPE
[[ -n $LC_ALL ]]  locale=$LC_ALL
locale=${locale:-POSIX}

if you want to keep it short; or the longer version with more ifs and
less shell magic.  Normally I'm not a big performance man myself; but
this is in an eclass used by lots of packages; not just one ebuild.


 bug 328047 is induced by a patch we add (it's not in upstream python).

 This patch comes from upstream.

 --
 Arfrever Frehtes Taifersar Arahesis




Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread Petteri Räty
On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote:

 If the variable is set but not exported then it is local to the shell
 env.  When bash goes to exec() python the local shell variables are
 not in the env; so os.environ() will not contain them.

 anta...@kyoto ~ $ foo=BAR
 anta...@kyoto ~ $ echo $foo
 BAR
 anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
 None
 anta...@kyoto ~ $ export foo
 anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)'
 BAR
 
 I want only variables exported to Python processes.
 

export -p

Petteri



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread James Cloos
 PH == Paweł Hajdan, phajdan...@gentoo.org writes:

PH Another thing we can consider is making UTF8 the default setup in
PH Gentoo. I think most people (including me) don't care whether it's
PH C or UTF8 as long as it works.

Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or
a POSIX.UTF-8 locale.

That should be done upstream in glibc, but were they to refuse then
Gentoo should add it to the glibc ebuild.

The language_country locales are just wrong for root.  They are often
broken (locales like en_US force case-insensitive colation, meaning that
a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files
which one would not expect to match) and cause bugs.

In fact, glibc's insistance that C and POSIX are ascii rather than raw
unspecified eight bit is itself a bug.

Utf8 is nice, but forcing the lang_country locales on root is not.

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6



Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread Mike Frysinger
On Saturday, July 31, 2010 17:39:27 James Cloos wrote:
 Paweł Hajdan writes:
  Another thing we can consider is making UTF8 the default setup in
  Gentoo. I think most people (including me) don't care whether it's
  C or UTF8 as long as it works.
 
 Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or
 a POSIX.UTF-8 locale.
 
 In fact, glibc's insistance that C and POSIX are ascii rather than raw
 unspecified eight bit is itself a bug.

yeah, no.  take it up with the POSIX group where they're already working on 
defining a C.UTF-8/etc... locale.

 That should be done upstream in glibc, but were they to refuse then
 Gentoo should add it to the glibc ebuild.

this doesnt really make sense, upstream or down.  if you wanted to talk about 
setting default LANG in the baselayout, then that's about the only reasonable 
possibility (especially since we already do this to a degree).  screwing with 
default locale when no locale variables are set is madness.

 The language_country locales are just wrong for root.  They are often
 broken (locales like en_US force case-insensitive colation, meaning that
 a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files
 which one would not expect to match) and cause bugs.

this is pure opinion
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-31 Thread Mike Frysinger
On Saturday, July 31, 2010 18:14:50 James Cloos wrote:
 Mike Frysinger writes:
  screwing with default locale when no locale variables are set is
  madness.
 
 I never said anything about changing C or POSIX.  Only about creating
 C.UTF-8 and/or POSIX.UTF-8.

sorry, i misread.  thought you were talking about changing default behavior 
and not just the creation of new locales.

  The language_country locales are just wrong for root.
 
  this is pure opinion
 
 Expert opinion.

i'm sure you're of that opinion ;).  my point was that the default isnt going 
to change in Gentoo that doesnt go through glibc, and that is most likely to 
not change either.
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-30 Thread Harald van Dijk
On Fri, Jul 30, 2010 at 01:16:18AM +0200, Arfrever Frehtes Taifersar Arahesis 
wrote:
 We received too many invalid bugs caused by unsupported locales. 
 python_pkg_setup() needs to check
 locale and print error (using eerror(), without die()), when unsupported 
 locale has been detected.

I'm strongly with Brian on this. You receive too many valid bug reports
caused by a broken package.  python_pkg_setup needs to do nothing. You
need to fix the bugs, or if fixing them is too much of an issue, work
around them in the ebuild. Keep in mind that having no locale explicitly
selected is the default for a Gentoo installation, and that the docs do
not (and should not) say anywhere that non-UTF-8 locales are unsupported.
In fact, quoting from
  http://www.gentoo.org/doc/en/guide-localization.xml:

It's also possible, and pretty common especially in a more traditional
 UNIX environment, to leave the global settings unchanged, i.e. in the
 C locale. Users can still specify their preferred locale in their own
 shell RC file:



Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-30 Thread Paweł Hajdan, Jr.
On 7/29/10 8:48 PM, Brian Harring wrote:
 It's basically annoying people into changing to partially 
 sidestep a couple of bugs, instead of fixing the issue- and that's the 
 wrong course of action.

I think that with python earlier than python-3 unicode handling is quite
complicated, and I'm not surprised there are problems with that.

Arfrever, does python-3 have the same problem with non-UTF8 locales?

Another thing we can consider is making UTF8 the default setup in
Gentoo. I think most people (including me) don't care whether it's C or
UTF8 as long as it works.

Paweł



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-30 Thread Brian Harring
On Fri, Jul 30, 2010 at 09:49:21AM -0700, Paweee Hajdan, Jr. wrote:
 On 7/29/10 8:48 PM, Brian Harring wrote:
  It's basically annoying people into changing to partially 
  sidestep a couple of bugs, instead of fixing the issue- and that's the 
  wrong course of action.
 
 I think that with python earlier than python-3 unicode handling is quite
 complicated, and I'm not surprised there are problems with that.

encoding handling wasn't that bad under py2k.  Py3k just enforces the 
boundaries- meaning you can't just skid by.

 Arfrever, does python-3 have the same problem with non-UTF8 locales?

ascii is a subset of utf-8 and ascii is a subset of latin-1; latin-1 
and utf-8 aren't compatible in encoded form however.

What this means is that the same set of bugs I ran down still will go 
boom if you have a utf-8 locale and the code in question was dealing 
w/ a latin-1 encoded file.


 Another thing we can consider is making UTF8 the default setup in
 Gentoo. I think most people (including me) don't care whether it's C or
 UTF8 as long as it works.

as long as it works in this case means fix the code as I've laid 
out.  Forcing locale's to sidestep it leaves the latin-1/utf8 
incompatibility to go 'boom'.

Basically, forcing utf8 doesn't make it work.  It reduces the cases 
breakage will show up while leaving those issues still there- frankly 
this is worse, can't fix those screwups without them breaking (for 
better or worse, and preferably breaking in a testcase).  We've got 4 
bugs, and only one of them is semi complex fix (dodcutils needs to 
require that html it's fed is utf8 compatible- valid enough req 
anyways since html shouldn't be latin-1, it should be ascii or utf8).

So.. get fixing, instead of dodging the work imo. ;)

~brian


pgpXHW24otcZE.pgp
Description: PGP signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-30 Thread Mike Frysinger
On Thursday, July 29, 2010 19:16:42 Arfrever Frehtes Taifersar Arahesis wrote:
 We received too many invalid bugs caused by unsupported locales.
 python_pkg_setup() needs to check locale and print error (using eerror(),
 without die()), when unsupported locale has been detected.

there is no such thing as an unsupported locale.  only buggy code you should 
be fixing and not dumping onto users.  i wish i could mark all my glibc bugs 
as invalid because i didnt feel like fixing them.
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-29 Thread Paweł Hajdan, Jr.
On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
 
 --- python.eclass
 +++ python.eclass
 @@ -355,6 +355,8 @@
   # Check if phase is pkg_setup().
   [[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used 
 only in pkg_setup() phase
  
 + local locale
 +
   if [[ $# -ne 0 ]]; then
   die ${FUNCNAME}() does not accept arguments
   fi
 @@ -407,6 +409,16 @@
   unset -f python_pkg_setup_check_USE_flags
   fi

nit: Why not declare local locale here, close to its usage?

 + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, 
 os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')
 + if [[ ${locale} != *.UTF-8 ]]; then
 + eerror
 + eerror Currently used locale '${locale}' is unsupported and 
 can cause build-time or run-time
 + eerror problems (usually UnicodeDecodeErrors or 
 UnicodeEncodeErrors). Bugs caused by this locale
 + eerror will be closed as invalid. It is recommended to use a 
 UTF-8 locale to avoid problems.
 + eerror See http://www.gentoo.org/doc/en/utf-8.xml for 
 information on how to fix locale.
 + eerror
 + fi
 +
   PYTHON_PKG_SETUP_EXECUTED=1
  }
  




signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-29 Thread Arfrever Frehtes Taifersar Arahesis
2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
 On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote:
  
  --- python.eclass
  +++ python.eclass
  @@ -355,6 +355,8 @@
  # Check if phase is pkg_setup().
  [[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used 
  only in pkg_setup() phase
   
  +   local locale
  +
  if [[ $# -ne 0 ]]; then
  die ${FUNCNAME}() does not accept arguments
  fi
  @@ -407,6 +409,16 @@
  unset -f python_pkg_setup_check_USE_flags
  fi
 
 nit: Why not declare local locale here, close to its usage?

It's consistent with style used in python.eclass.

  +   locale=$(python -c 'import os; print(os.environ.get(LC_ALL, 
  os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')
  +   if [[ ${locale} != *.UTF-8 ]]; then
  +   eerror
  +   eerror Currently used locale '${locale}' is unsupported and 
  can cause build-time or run-time
  +   eerror problems (usually UnicodeDecodeErrors or 
  UnicodeEncodeErrors). Bugs caused by this locale
  +   eerror will be closed as invalid. It is recommended to use a 
  UTF-8 locale to avoid problems.
  +   eerror See http://www.gentoo.org/doc/en/utf-8.xml for 
  information on how to fix locale.
  +   eerror
  +   fi
  +
  PYTHON_PKG_SETUP_EXECUTED=1
   }
   

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-29 Thread Brian Harring
On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis 
wrote:
 --- python.eclass
 +++ python.eclass
 @@ -355,6 +355,8 @@
   # Check if phase is pkg_setup().
   [[ ${EBUILD_PHASE} != setup ]]  die ${FUNCNAME}() can be used 
 only in pkg_setup() phase
  
 + local locale
 +
   if [[ $# -ne 0 ]]; then
   die ${FUNCNAME}() does not accept arguments
   fi
 @@ -407,6 +409,16 @@
   unset -f python_pkg_setup_check_USE_flags
   fi
  
 + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, 
 os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX')

You're using python to get the exported env.  Don't.  Use bash (you're 
invoking python from freaking bash after all)...

 + if [[ ${locale} != *.UTF-8 ]]; then
 + eerror
 + eerror Currently used locale '${locale}' is unsupported and 
 can cause build-time or run-time
 + eerror problems (usually UnicodeDecodeErrors or 
 UnicodeEncodeErrors). Bugs caused by this locale
 + eerror will be closed as invalid. It is recommended to use a 
 UTF-8 locale to avoid problems.
 + eerror See http://www.gentoo.org/doc/en/utf-8.xml for 
 information on how to fix locale.
 + eerror

For cases such as this, ewarn, not eerror.  It's not an actual error, 
it's a potential source of problems people may see.

The more I look into this issue, the more I'm convinced it's not user 
settings that are problem- the problem is in the code, not in user 
env.  You've stated in a couple of places that C/Posix locales are 
not supported, which frankly is very whacked- that's not really a 
proclamation you can make on your own for python, and you're actually 
ignoring that this problem would just as easily rear it's head with a 
latin-1 encoded file.


Take a look at 302425; the traceback in that is a classic example of 
where they *should* be using bytes mode (they don't need to interpret 
the data, just write the script across, thus bytes).

bug 328047 is induced by a patch we add (it's not in upstream python).  
The code in question also is invoking fricking ldd a few steps prior 
which is questionable in multiple ways: either way, relevant chunk is
+os.system(ldd %s  %s % (do_readline, tmpfile))
+fp = open(tmpfile)
+for ln in fp:

So... roughly, it invokes os.system, which will pass the environment 
straight through to it, meaning locale gets passed down.

Then it open's the file.  Note it specifes *NO ENCODING* nor is their 
actually an enforced locale best I can tell , thus ascii being the 
default.  The screwup here is in our patches- said patches should be 
forcing posix locale for the ldd call (resulting in ascii).  If you 
think through this bug, we've seen this multiple times in grep/sed 
calls- this is literally no different.

bug 287439 is a screw up in the programs source... should've been 
using bytes (non arguable).  Matter of fact, while generally I think 
Tarek knows what the hell he's doing, the skip they added to the 
tests ignored an actual valid bug in setuptools/distribute- shebangs 
from the standpoint of the kernel need to be consistant.  Thus reading 
the shebang line itself should be done in bytes, than converted to 
ascii and interpretted- they tried opening the file (in whole) in 
bytes, meaning they tried enforcing ascii across the whole buffer- 
not just the first line.  Program bug.

These bugs I got via searching for 'ALL python locale', and 
identifying the ones that were actually locale related.  I've at this 
point looked into the source of 3 bugs- meaning literally, 3 bugs 
checked into, 3 instances where the code was wrong.

I'll leave it as an exercise for others to keep digging, but the point 
here is that the programs themselves screwup their locale handling- 
trying to force all systems to use a utf-8 locale for the env is just 
a hack instead of fixing the actual issue.  A pretty bad hack 
considering I've spent all of 30 minutes digging into this and rooting 
out the actual flaws in the src I might add.

For shits and giggles, lets add one more bug in- one that has the 
potential of rearing its head in random consuming pkgs, bug 322425 
(docutils's build_html being flawed), their encoding handling is 
intrinsically flawed.  The encoding of a file their 
installing/parsing should be determined by the file itself- not 
attempting to arbitrarily force it to whatever locale the user happens 
to be running (which is exactly the first thing buildhtml.py attempts, 
literally `locale.setlocale(locale.LC_ALL, '')` at line 20).  The 
issue is not people using ascii locales, the issue is that these tools 
do not handle encoding correctly.

Recall, one of the purposes of py3k going bytes vs text (aka unicode) 
was to make clear that textual data's encoding need be known.  All of 
this code isn't actually forcing/handling the encoding for the data 
they deal in- meaning these are 

Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-29 Thread Paweł Hajdan, Jr.
On 7/29/10 7:29 PM, Arfrever Frehtes Taifersar Arahesis wrote:
 2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a):
 nit: Why not declare local locale here, close to its usage?
 It's consistent with style used in python.eclass.

Fine for me then. Thanks for explaining.

Paweł



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Locale check in python_pkg_setup()

2010-07-29 Thread Brian Harring
On Fri, Jul 30, 2010 at 05:15:19AM +0200, Krzysztof Pawlik wrote:
 On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote:
  +   eerror See http://www.gentoo.org/doc/en/utf-8.xml for
  information on how to fix locale.
 
 I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code
 that has bugs. Can you please change wording here to read something along ...
 for information on switching to Unicode locale. instead of suggesting that
 users locale is broken.

From where I'm sitting, the only ebuild that has any business telling 
me to change (or suggesting how) locale is glibc.  Especially when 
we're talking about a warning that will be in 7.6% of the versions
in the tree.

That's pretty freaking spammy... end result will be people switching 
(for better or worse) to stop seeing the complaints.

It's basically annoying people into changing to partially 
sidestep a couple of bugs, instead of fixing the issue- and that's the 
wrong course of action.

~brian


pgpOlm3zDdL77.pgp
Description: PGP signature