Re: [gentoo-dev] Locale check in python_pkg_setup()
2010-07-31 22:25:26 Petteri Räty napisał(a): On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote: If the variable is set but not exported then it is local to the shell env. When bash goes to exec() python the local shell variables are not in the env; so os.environ() will not contain them. anta...@kyoto ~ $ foo=BAR anta...@kyoto ~ $ echo $foo BAR anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' None anta...@kyoto ~ $ export foo anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' BAR I want only variables exported to Python processes. export -p It would have to be parsed using e.g. grep and sed. It's easier to call Python in this case. The call to Python is sufficiently fast: $ time python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX' /dev/null real0m0.062s user0m0.051s sys 0m0.011s -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
A milder warning will be printed. -- Arfrever Frehtes Taifersar Arahesis --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,15 @@ unset -f python_pkg_setup_check_USE_flags fi + if [[ $(locale charmap) != UTF-8 ]]; then + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') + ewarn + ewarn Currently used locale '${locale}' can cause UnicodeDecodeError or UnicodeEncodeError + ewarn exceptions. It is recommended to use a UTF-8 locale to avoid problems. + ewarn See http://www.gentoo.org/doc/en/utf-8.xml for information on how to change locale. + ewarn + fi + PYTHON_PKG_SETUP_EXECUTED=1 } signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Mon, Aug 02, 2010 at 11:02:20PM +0200, Arfrever Frehtes Taifersar Arahesis wrote: It would have to be parsed using e.g. grep and sed. It's easier to call Python in this case. It's even easier not to. The call to Python is sufficiently fast: $ time python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX' /dev/null real0m0.062s user0m0.051s sys 0m0.011s Let's compare. On my system: time python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX' en_GB.UTF-8 real0m0.020s user0m0.016s sys 0m0.004s time sh -c 'echo ${LC_ALL:-${LC_CTYPE:-${LANG:-POSIX}}}' en_GB.UTF-8 real0m0.001s user0m0.000s sys 0m0.000s And that's after several runs for both, so it's not caused by the initial load of python, which wasn't in memory yet. Yes, 0.019s is very little, but in this case I see absolutely no benefit whatsoever in calling python. Plus sh has the advantage of actually working when LC_ALL is exported as (which in LC_* means the same as having it unset)... But why exactly are you concerned about LC_* being defined but not exported anyway? You're checking from an ebuild; locales are going to get inherited from portage or profile.env anyway, so you can just assume that if they _are_ set, they're exported. The only way they might not be is if the user is messing with the locale from the bashrc, and if the user's doing that, the user really needs to fix the bashrc and export the vars anyway. None of this changes the fact that locale checks warns about bugs in packages, not bugs in the user's configuration.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Mon, 2 Aug 2010 23:18:59 +0200 Arfrever Frehtes Taifersar Arahesis arfre...@gentoo.org wrote: A milder warning will be printed. I distinctly remember several voices being raised in this thread very recently, suggesting if not demanding that you should not convey a message like that at all, but fix the affected packages instead. jer
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Mon, 2 Aug 2010 23:18:59 +0200 Arfrever Frehtes Taifersar Arahesis arfre...@gentoo.org wrote: + ewarn exceptions. It is recommended to use a UTF-8 locale to avoid problems. + ewarn See http://www.gentoo.org/doc/en/utf-8.xml for information on how to change locale. In fact the documentation you point to positively encourages users/admins to set up locales and explains how to do it system-wide, and in no place does it warn against any adverse effects of doing so. So you can't even point to that documentation in defence of this milder warning. jer
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Mon, Aug 02, 2010 at 11:18:59PM +0200, Arfrever Frehtes Taifersar Arahesis wrote: A milder warning will be printed. Guessing you didn't get the part about no warning should be put in that everyone stated? You're ignoring that this message also will make users think that switching their locale will magically fix programs that chuck encoding errors (validly so, if not particularly user friendly) when running into improperly encoded files (regardless of locale). This locale crap doesn't belong in the tree, mild warning or not- do not add it. Take it up to the council if you really think everyone else is wrong and still want it. ~harring pgplJh89Kgb3h.pgp Description: PGP signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
2010-07-30 04:36:22 Brian Harring napisał(a): On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote: --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,16 @@ unset -f python_pkg_setup_check_USE_flags fi + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') You're using python to get the exported env. Don't. Use bash (you're invoking python from freaking bash after all)... Given variable can be set, but not exported. bug 328047 is induced by a patch we add (it's not in upstream python). This patch comes from upstream. -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Sat, Jul 31, 2010 at 7:44 AM, Arfrever Frehtes Taifersar Arahesis arfre...@gentoo.org wrote: 2010-07-30 04:36:22 Brian Harring napisał(a): On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote: --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,16 @@ unset -f python_pkg_setup_check_USE_flags fi + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') You're using python to get the exported env. Don't. Use bash (you're invoking python from freaking bash after all)... Given variable can be set, but not exported. If the variable is set but not exported then it is local to the shell env. When bash goes to exec() python the local shell variables are not in the env; so os.environ() will not contain them. anta...@kyoto ~ $ foo=BAR anta...@kyoto ~ $ echo $foo BAR anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' None anta...@kyoto ~ $ export foo anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' BAR so how is this any different than: [[ -n $LC_TYPE ]] locale=$LC_TYPE [[ -n $LC_ALL ]] locale=$LC_ALL locale=${locale:-POSIX} if you want to keep it short; or the longer version with more ifs and less shell magic. Normally I'm not a big performance man myself; but this is in an eclass used by lots of packages; not just one ebuild. bug 328047 is induced by a patch we add (it's not in upstream python). This patch comes from upstream. -- Arfrever Frehtes Taifersar Arahesis
Re: [gentoo-dev] Locale check in python_pkg_setup()
On 07/31/2010 11:10 PM, Arfrever Frehtes Taifersar Arahesis wrote: If the variable is set but not exported then it is local to the shell env. When bash goes to exec() python the local shell variables are not in the env; so os.environ() will not contain them. anta...@kyoto ~ $ foo=BAR anta...@kyoto ~ $ echo $foo BAR anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' None anta...@kyoto ~ $ export foo anta...@kyoto ~ $ python -c 'import os; print os.environ.get(foo)' BAR I want only variables exported to Python processes. export -p Petteri signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
PH == Paweł Hajdan, phajdan...@gentoo.org writes: PH Another thing we can consider is making UTF8 the default setup in PH Gentoo. I think most people (including me) don't care whether it's PH C or UTF8 as long as it works. Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or a POSIX.UTF-8 locale. That should be done upstream in glibc, but were they to refuse then Gentoo should add it to the glibc ebuild. The language_country locales are just wrong for root. They are often broken (locales like en_US force case-insensitive colation, meaning that a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files which one would not expect to match) and cause bugs. In fact, glibc's insistance that C and POSIX are ascii rather than raw unspecified eight bit is itself a bug. Utf8 is nice, but forcing the lang_country locales on root is not. -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Saturday, July 31, 2010 17:39:27 James Cloos wrote: Paweł Hajdan writes: Another thing we can consider is making UTF8 the default setup in Gentoo. I think most people (including me) don't care whether it's C or UTF8 as long as it works. Forcing utf-8 will only be reasonable when there is a C.UTF-8 and/or a POSIX.UTF-8 locale. In fact, glibc's insistance that C and POSIX are ascii rather than raw unspecified eight bit is itself a bug. yeah, no. take it up with the POSIX group where they're already working on defining a C.UTF-8/etc... locale. That should be done upstream in glibc, but were they to refuse then Gentoo should add it to the glibc ebuild. this doesnt really make sense, upstream or down. if you wanted to talk about setting default LANG in the baselayout, then that's about the only reasonable possibility (especially since we already do this to a degree). screwing with default locale when no locale variables are set is madness. The language_country locales are just wrong for root. They are often broken (locales like en_US force case-insensitive colation, meaning that a command such as 'rm [a-z]*' will unlink(2) 'Makefile' and similar files which one would not expect to match) and cause bugs. this is pure opinion -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Saturday, July 31, 2010 18:14:50 James Cloos wrote: Mike Frysinger writes: screwing with default locale when no locale variables are set is madness. I never said anything about changing C or POSIX. Only about creating C.UTF-8 and/or POSIX.UTF-8. sorry, i misread. thought you were talking about changing default behavior and not just the creation of new locales. The language_country locales are just wrong for root. this is pure opinion Expert opinion. i'm sure you're of that opinion ;). my point was that the default isnt going to change in Gentoo that doesnt go through glibc, and that is most likely to not change either. -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Fri, Jul 30, 2010 at 01:16:18AM +0200, Arfrever Frehtes Taifersar Arahesis wrote: We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check locale and print error (using eerror(), without die()), when unsupported locale has been detected. I'm strongly with Brian on this. You receive too many valid bug reports caused by a broken package. python_pkg_setup needs to do nothing. You need to fix the bugs, or if fixing them is too much of an issue, work around them in the ebuild. Keep in mind that having no locale explicitly selected is the default for a Gentoo installation, and that the docs do not (and should not) say anywhere that non-UTF-8 locales are unsupported. In fact, quoting from http://www.gentoo.org/doc/en/guide-localization.xml: It's also possible, and pretty common especially in a more traditional UNIX environment, to leave the global settings unchanged, i.e. in the C locale. Users can still specify their preferred locale in their own shell RC file:
Re: [gentoo-dev] Locale check in python_pkg_setup()
On 7/29/10 8:48 PM, Brian Harring wrote: It's basically annoying people into changing to partially sidestep a couple of bugs, instead of fixing the issue- and that's the wrong course of action. I think that with python earlier than python-3 unicode handling is quite complicated, and I'm not surprised there are problems with that. Arfrever, does python-3 have the same problem with non-UTF8 locales? Another thing we can consider is making UTF8 the default setup in Gentoo. I think most people (including me) don't care whether it's C or UTF8 as long as it works. Paweł signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Fri, Jul 30, 2010 at 09:49:21AM -0700, Paweee Hajdan, Jr. wrote: On 7/29/10 8:48 PM, Brian Harring wrote: It's basically annoying people into changing to partially sidestep a couple of bugs, instead of fixing the issue- and that's the wrong course of action. I think that with python earlier than python-3 unicode handling is quite complicated, and I'm not surprised there are problems with that. encoding handling wasn't that bad under py2k. Py3k just enforces the boundaries- meaning you can't just skid by. Arfrever, does python-3 have the same problem with non-UTF8 locales? ascii is a subset of utf-8 and ascii is a subset of latin-1; latin-1 and utf-8 aren't compatible in encoded form however. What this means is that the same set of bugs I ran down still will go boom if you have a utf-8 locale and the code in question was dealing w/ a latin-1 encoded file. Another thing we can consider is making UTF8 the default setup in Gentoo. I think most people (including me) don't care whether it's C or UTF8 as long as it works. as long as it works in this case means fix the code as I've laid out. Forcing locale's to sidestep it leaves the latin-1/utf8 incompatibility to go 'boom'. Basically, forcing utf8 doesn't make it work. It reduces the cases breakage will show up while leaving those issues still there- frankly this is worse, can't fix those screwups without them breaking (for better or worse, and preferably breaking in a testcase). We've got 4 bugs, and only one of them is semi complex fix (dodcutils needs to require that html it's fed is utf8 compatible- valid enough req anyways since html shouldn't be latin-1, it should be ascii or utf8). So.. get fixing, instead of dodging the work imo. ;) ~brian pgpXHW24otcZE.pgp Description: PGP signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Thursday, July 29, 2010 19:16:42 Arfrever Frehtes Taifersar Arahesis wrote: We received too many invalid bugs caused by unsupported locales. python_pkg_setup() needs to check locale and print error (using eerror(), without die()), when unsupported locale has been detected. there is no such thing as an unsupported locale. only buggy code you should be fixing and not dumping onto users. i wish i could mark all my glibc bugs as invalid because i didnt feel like fixing them. -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote: --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,16 @@ unset -f python_pkg_setup_check_USE_flags fi nit: Why not declare local locale here, close to its usage? + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') + if [[ ${locale} != *.UTF-8 ]]; then + eerror + eerror Currently used locale '${locale}' is unsupported and can cause build-time or run-time + eerror problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale + eerror will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems. + eerror See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale. + eerror + fi + PYTHON_PKG_SETUP_EXECUTED=1 } signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a): On 7/29/10 4:16 PM, Arfrever Frehtes Taifersar Arahesis wrote: --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,16 @@ unset -f python_pkg_setup_check_USE_flags fi nit: Why not declare local locale here, close to its usage? It's consistent with style used in python.eclass. + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') + if [[ ${locale} != *.UTF-8 ]]; then + eerror + eerror Currently used locale '${locale}' is unsupported and can cause build-time or run-time + eerror problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale + eerror will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems. + eerror See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale. + eerror + fi + PYTHON_PKG_SETUP_EXECUTED=1 } -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Fri, Jul 30, 2010 at 01:16:42AM +0200, Arfrever Frehtes Taifersar Arahesis wrote: --- python.eclass +++ python.eclass @@ -355,6 +355,8 @@ # Check if phase is pkg_setup(). [[ ${EBUILD_PHASE} != setup ]] die ${FUNCNAME}() can be used only in pkg_setup() phase + local locale + if [[ $# -ne 0 ]]; then die ${FUNCNAME}() does not accept arguments fi @@ -407,6 +409,16 @@ unset -f python_pkg_setup_check_USE_flags fi + locale=$(python -c 'import os; print(os.environ.get(LC_ALL, os.environ.get(LC_CTYPE, os.environ.get(LANG, POSIX') You're using python to get the exported env. Don't. Use bash (you're invoking python from freaking bash after all)... + if [[ ${locale} != *.UTF-8 ]]; then + eerror + eerror Currently used locale '${locale}' is unsupported and can cause build-time or run-time + eerror problems (usually UnicodeDecodeErrors or UnicodeEncodeErrors). Bugs caused by this locale + eerror will be closed as invalid. It is recommended to use a UTF-8 locale to avoid problems. + eerror See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale. + eerror For cases such as this, ewarn, not eerror. It's not an actual error, it's a potential source of problems people may see. The more I look into this issue, the more I'm convinced it's not user settings that are problem- the problem is in the code, not in user env. You've stated in a couple of places that C/Posix locales are not supported, which frankly is very whacked- that's not really a proclamation you can make on your own for python, and you're actually ignoring that this problem would just as easily rear it's head with a latin-1 encoded file. Take a look at 302425; the traceback in that is a classic example of where they *should* be using bytes mode (they don't need to interpret the data, just write the script across, thus bytes). bug 328047 is induced by a patch we add (it's not in upstream python). The code in question also is invoking fricking ldd a few steps prior which is questionable in multiple ways: either way, relevant chunk is +os.system(ldd %s %s % (do_readline, tmpfile)) +fp = open(tmpfile) +for ln in fp: So... roughly, it invokes os.system, which will pass the environment straight through to it, meaning locale gets passed down. Then it open's the file. Note it specifes *NO ENCODING* nor is their actually an enforced locale best I can tell , thus ascii being the default. The screwup here is in our patches- said patches should be forcing posix locale for the ldd call (resulting in ascii). If you think through this bug, we've seen this multiple times in grep/sed calls- this is literally no different. bug 287439 is a screw up in the programs source... should've been using bytes (non arguable). Matter of fact, while generally I think Tarek knows what the hell he's doing, the skip they added to the tests ignored an actual valid bug in setuptools/distribute- shebangs from the standpoint of the kernel need to be consistant. Thus reading the shebang line itself should be done in bytes, than converted to ascii and interpretted- they tried opening the file (in whole) in bytes, meaning they tried enforcing ascii across the whole buffer- not just the first line. Program bug. These bugs I got via searching for 'ALL python locale', and identifying the ones that were actually locale related. I've at this point looked into the source of 3 bugs- meaning literally, 3 bugs checked into, 3 instances where the code was wrong. I'll leave it as an exercise for others to keep digging, but the point here is that the programs themselves screwup their locale handling- trying to force all systems to use a utf-8 locale for the env is just a hack instead of fixing the actual issue. A pretty bad hack considering I've spent all of 30 minutes digging into this and rooting out the actual flaws in the src I might add. For shits and giggles, lets add one more bug in- one that has the potential of rearing its head in random consuming pkgs, bug 322425 (docutils's build_html being flawed), their encoding handling is intrinsically flawed. The encoding of a file their installing/parsing should be determined by the file itself- not attempting to arbitrarily force it to whatever locale the user happens to be running (which is exactly the first thing buildhtml.py attempts, literally `locale.setlocale(locale.LC_ALL, '')` at line 20). The issue is not people using ascii locales, the issue is that these tools do not handle encoding correctly. Recall, one of the purposes of py3k going bytes vs text (aka unicode) was to make clear that textual data's encoding need be known. All of this code isn't actually forcing/handling the encoding for the data they deal in- meaning these are
Re: [gentoo-dev] Locale check in python_pkg_setup()
On 7/29/10 7:29 PM, Arfrever Frehtes Taifersar Arahesis wrote: 2010-07-30 01:20:19 Paweł Hajdan, Jr. napisał(a): nit: Why not declare local locale here, close to its usage? It's consistent with style used in python.eclass. Fine for me then. Thanks for explaining. Paweł signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Locale check in python_pkg_setup()
On Fri, Jul 30, 2010 at 05:15:19AM +0200, Krzysztof Pawlik wrote: On 07/30/10 01:16, Arfrever Frehtes Taifersar Arahesis wrote: + eerror See http://www.gentoo.org/doc/en/utf-8.xml for information on how to fix locale. I'm with Brian on this one - my locale (C/POSIX) is not broken, it's the code that has bugs. Can you please change wording here to read something along ... for information on switching to Unicode locale. instead of suggesting that users locale is broken. From where I'm sitting, the only ebuild that has any business telling me to change (or suggesting how) locale is glibc. Especially when we're talking about a warning that will be in 7.6% of the versions in the tree. That's pretty freaking spammy... end result will be people switching (for better or worse) to stop seeing the complaints. It's basically annoying people into changing to partially sidestep a couple of bugs, instead of fixing the issue- and that's the wrong course of action. ~brian pgpOlm3zDdL77.pgp Description: PGP signature